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Abstract 

This  thesis  examines  a  complete  design  framework  for  a  real-time,  autonomous  system  with 
specialized  VLSI  hardware  for  computing  3-D  camera,  motion.  In  the  proposed  architecture, 
the  first  step  is  to  determine  point  correspondences  between  two  images.  Two  processors,  a 
CCD  array  edge  detector  and  a  mixed  analog/digital  binary  block  correlator,  are  proposed 
for  this  task.  The  report  is  divided  into  three  parts.  Part  I  covers  the  algorithmic  analysis; 
part  II  describes  the  design  and  test  of  a  32x32  CCD  edge  detector  fabricated  through 
MO  SIS;  and  part  III  compares  the  design  of  the  mixed  analog/digital  correlator  to  a  fully 
digital  implementation. 
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Abstract 

Computing  the  general  3-D  motion  of  a  camera  from  the  changes  in  its  images  as  it 
moves  through  the  environment  is  an  important  problem  in  autonomous  navigation  and 
in  machine  vision  in  general.  The  goal  of  this  research  has  been  to  establish  the  complete 
design  framework  for  a  small,  autonomous  system  with  specialized  analog  and  digital  VLSI 
hardware  for  computing  3-D  camera  motion  in  real-time.  Such  a  system  would  be  suitable 
for  mounting  on  mobile  or  remote  platforms  that  cannot  be  tethered  to  a  computer  and  for 
which  the  size,  weight  and  power  consumption  of  the  components  are  critical  factors. 

Combining  algorithmic  design  with  circuit  design  is  essential  for  building  a  robust  size- 
and  power-efficient  system,  as  there  are  constraints  imposed  both  by  technology  and  by 
the  nature  of  the  problem  which  can  be  more  efficiently  satisfied  jointly.  The  first  part  of 
this  thesis  is  thus  devoted  to  the  analysis  and  development  of  the  algorithms  used  in  the 
system  and  implemented  by  the  special  processors.  Among  the  major  theoretical  results 
presented  in  Part  I  are  the  development  of  the  multi-scale  veto  edge  detection  algorithm,  the 
derivation  of  a  simplified  method  for  solving  the  motion  equations,  and  a  complete  analysis 
of  the  effects  of  measurement  errors  on  the  reliability  of  the  motion  estimates. 

In  the  proposed  system  architecture,  the  first  step  is  to  determine  point  correspondences 
between  two  successive  images.  Two  specialized  processors,  the  first  a  CCD  array  edge  de¬ 
tector  implementing  the  multi-scale  veto  algorithm,  and  the  second  a  mixed  analog/ digital 
binary  block  correlator,  are  proposed  and  designed  for  this  task.  A  prototype  CCD  edge 
detector  was  fabricated  through  MOSIS,  and  based  on  the  test  results  from  this  chip,  im¬ 
provements  are  suggested  so  that  a  full-size  focal- plane  processor  can  be  built.  The  design 
of  the  mixed  analog/digital  correlator  is  compared  with  a  fully  digital  implementation  and 
is  seen  to  yield  a  significant  reduction  in  silicon  area,  without  compromising  operating  speed. 
In  the  conclusions,  the  theoretical  and  experimental  results  from  the  different  parts  of  this 
thesis  are  combined  into  a  single  design  proposal  for  the  complete  motion  system. 
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Chapter  1 


Introduction 


It  is  difficult  to  overstate  the  potential  usefulness  of  an  automated  system  to  compute 
motion  from  visual  information  for  numerous  applications  involving  the  control  and  naviga¬ 
tion  of  moving  vehicles.  Deducing  3-D  motion  by  measuring  the  changes  in  sucessive  images 
taken  by  a  camera  moving  through  the  environment  involves  determining  the  perspective 
transformation  between  the  coordinate  systems  defined  by  the  camera’s  principal  axes  at 
its  different  locations.  Solving  this  problem  is  important  not  only  for  motion  estimation, 
but  also  for  determining  depth  from  binocular  stereopairs.  It  is  in  fact  equivalent  to  the 
classic  problem  in  photogrammetry  of  relative  orientation,  for  which  methods  were  devel¬ 
oped  by  cartographers  over  a  hundred  years  ago  to  measure  the  topography  of  large  scale 
land  masses  [1],  [2],  [3],  [4],  [5]. 

Computing  relative  orientation  involves  two  difficult  subproblems.  First,  corresponding 
points  must  be  identified  in  the  two  images,  and  second,  the  nonlinear  equations  defining  the 
perspective  transformation  must  be  iirverted  to  solve  for  the  parameters  of  the  motion.  Not 
aU  image  pairs  allow  an  unambiguous  determination  of  their  relative  orientation,  however. 
For  some  configurations  of  points  there  is  not  a  unique  solution  to  the  motion  equations 
and  for  many  others  the  problem  is  ill-conditioned  [6],  [7],  [3],  [4],  [8].  The  methods  devel¬ 
oped  long  ago  for  cartography  relied  on  considerable  human  intervention  to  overcome  these 
difficulties.  Large  optical  devices  known  as  stereoplotters  were  invented  to  align  match¬ 
ing  features  using  a  floating  mark  positioned  by  an  operator,  while  general  knowdedge  of 
the  geometry  of  the  scene  and  the  camera,  positions  was  used  to  aid  in  solving  the  motion 
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equations^ . 

Efforts  to  construct  autonomous  systems  have  also  been  limited  by  the  complexity  of  the 
task.  At  present,  machine  vision  algorithms  for  computing  camera  motion  and  alignment 
have  reached  a  level  of  sophistication  in  which  they  can  operate  under  special  conditions  in 
restricted  environments.  Among  the  systems  wdiich  have  been  developed,  however,  there  is  a 
strong  correlation  between  robustness  and  the  amount  of  computational  resources  employed. 
Two  approaches  which  are  commonly  taken  are  to  either  impose  very  restrictive  conditions 
on  the  type  and  amount  of  relative  motion  allowed,  in  which  case  simple  algorithms  can  be 
used  to  yield  qualitatively  correct  results  as  long  as  the  basic  assumptions  are  not  violated; 
or  to  relax  the  restrictions  and  therefore  implement  the  system  with  complex  algorithms 
that  require  powerful  processors. 

The  goal  of  this  thesis  is  to  go  beyond  these  limitations  and  to  design  a  system  that  is 
both  unrestrictive  and  that  uses  minimal  hardware.  Specifically,  the  objectives  of  such  a 
system  are  the  following: 

•  The  complete  system  should  be  physically  small  and  should  operate  with  minimal 
power.  This  is  particularly  important  if  it  is  to  be  used  in  remote  or  inaccessible 
environments. 

•  It  should  allow  real-time,  frame-rate  operation. 

•  It  must  be  able  to  either  produce  an  accurate  estimate  of  the  motion  for  the  majority 
of  situations  it  is  likely  to  encounter  or  to  recognize  and  report  that  a  reliable  estimate 
cannot  be  obtained  from  the  given  images. 

•  The  system  should  be  self-contained,  in  the  sense  that  neither  external  processing  nor 
outside  intervention  is  required  to  determine  accurate  estimates  of  the  camera  motion. 

A  central  tenet  of  this  thesis  is  that  in  order  to  meet  the  first  two  requirements,  spe¬ 
cialized  VLSI  processors,  combining  both  analog  and  digital  technology,  are  needed  to 
perform  specific  tasks  within  the  system.  Clearly,  meeting  the  last  two  requirements  does 
not  necessitate  special  hardware  since  they  influence  only  the  choice  of  the  algorithms  to 

^Cartography  is  still  a  labor  intensive  process;  although  in  the  interest  of  developing  geographic  infor¬ 
mation  systems  (CIS),  there  have  been  many  efforts  in  the  last  decade  to  automate  mapping  techniques 
by  applying  algorithms  developed  for  machine  vision  ([9],  see  edso  the  April  1983  issue  of  Photogrammetric 
Engineering  and  Remote  Sensing). 
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be  implemented.  Obviously,  any  algorithm  which  can  be  wired  into  a  circuit  can  be  pro¬ 
grammed  on  general  purpose  digital  hardware.  The  motivation  for  designing  specialized 
processors  is  the  idea  that  in  doing  so  a  significant  reduction  in  size  and  power  consumption 
can  be  achieved  over  general  purpose  hardware. 

There  are  many  aspects  to  a  completely  general  system  for  computing  camera  motion 
and  alignment,  and  it  is  necessary  to  define  the  limits  of  this  study.  Specifically, 

•  Only  image  sequences  from  passive  navigation  will  be  examined.  In  other  words,  it 
is  always  assumed  that  the  environment  is  static  and  all  differences  observed  in  the 
images  are  due  to  differences  in  camera  position.  The  case  of  multiple  independently 
moving  objects  in  the  scene  will  not  be  explicitly  addressed. 

•  The  system  design  is  based  entirely  on  the  problem  of  estimating  motion  from  two 
frames  only.  Many  researchers  [10],  [11],  [12]  have  proposed  the  use  of  multiple  frames 
in  order  to  improve  reliability,  on  the  grounds  that  the  results  from  two  frames  are 
overly  sensitive  to  error  and  are  numerically  unstable.  The  philosophy  of  the  present 
approach  is  that  it  is  necessary  to  build  a  system  which  can  extract  the  best  results 
possible  from  two  frames  in  order  to  make  a  multi-frame  system  even  more  reliable. 
Nothing  in  the  design  of  the  present  system  wiU  prevent  it  from  being  used  as  a  module 
within  a  more  comprehensive  multi-frame  system. 

The  goal  of  this  thesis  is  not  to  build  the  complete  motion  system,  but  to  develop 
the  theory  on  which  it  is  based,  and  to  design  the  specialized  processors  needed  for  its 
operation.  This  thesis  is  divided  into  three  major  parts.  Part  I  covers  the  theoretical  issues 
of  selecting  and  adapting  the  algorithms  to  be  implemented  by  the  special  processors.  It 
also  includes  a  complete  analysis  of  the  numerical  stability  of  the  motion  algorithm  and 
of  the  sensitivity  of  its  estimates  to  errors  in  the  data.  Parts  II  and  III  are  concerned 
with  the  design  of  the  processors  needed  for  detemining  point  correspondences.  One  of  the 
conclusions  of  Part  I  is  that  matching  edges  in  the  two  images  by  binary  block  correlation 
is  the  most  suitable  method  for  implementation  in  VLSI.  Part  II  describes  a  prototype  edge 
detector  built  in  CCD-CMOS  technology  which  implements  the  multi-scale  veto  algorithm 
presented  in  Chapter  5,  while  Part  III  examines  the  benefits  of  combining  analog  and  digital 
processing  to  design  an  area-efficient  edge  matching  circuit. 


Part  I 

Algorithms  and  Theory 
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Chapter  2 


Methods  for  Computing  Motion  and  Structure 


Computing  motion  and  structure  from  different  views  involves  two  operations:  match¬ 
ing  features  in  the  different  images,  and  solving  the  motion  equations  for  the  rigid  body 
translation  and  rotation  which  best  describes  the  observed  displacements  of  brightness  pat¬ 
terns  in  the  image  plane.  Methods  can  be  grouped  into  two  categories  according  to  whether 
features  in  the  two  images  are  matched  explicitly  or  implicitly.  Explicit  methods  generate 
a  discrete  set  of  feature  correspondences  and  solve  the  motion  equations  using  the  known 
coordinates  of  the  pairs  of  matched  points  in  the  set.  Implicit  methods  formulate  the  motion 
equations  in  terms  of  the  temporal  and  spatial  derivatives  of  brightness  and  the  incremental 
displacements  in  the  image  plane,  or  optical  flow.  In  order  to  avoid  explicit  matching,  these 
methods  incorporate  additional  constraints,  such  as  brightness  constancy  and  smoothness 
of  the  optical  flow,  and  derive  the  motion  from  a  global  optimization  procedure. 

There  are  advantages  and  weaknesses  to  both  approaches.  Explicit  methods  require  few 
assumptions  other  than  rigid  motion;  however,  they  must  first  solve  the  difficult  problem  of 
finding  an  accurate  set  of  point  correspondences.  In  addition,  although  more  of  a  concern 
for  determining  structure  from  binocular  stereo  than  for  computing  motion,  depth  can  only 
be  recovered  for  points  in  the  set.  Implicit  methods  circumvent  the  correspondence  problem 
but  in  exchange  must  make  more  restrictive  assumptions  on  the  environment.  Since  they  are 
based  on  approximating  brightness  derivatives  from  sampled  data,  they  are  both  sensitive 
to  noise  and  sensor  variation  and  susceptible  to  aliasing. 

In  the  interest  of  removing  as  many  restrictions  as  possible,  an  explicit  approach  has 
been  adopted  for  the  present  system.  Details  of  the  specific  methods  which  will  be  used  to 
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perform  the  tasks  of  finding  point  correspondences  and  solving  the  motion  equations  will 
be  described  in  the  following  chapters.  In  this  chapter,  in  order  to  situate  this  research 
with  respect  to  related  work,  the  basic  methods  for  computing  motion  and  alignment  are 
presented  in  a  more  general  context.  I  will  first  rederive  the  fundamental  equations  of 
perspective  geometry,  presenting  the  notation  which  is  used  throughout  the  thesis,  and  will 
then  discuss  several  of  the  more  significant  algorithms  which  have  been  developed  for  both 
the  explicit  and  the  implicit  approaches.  Finally,  I  will  review  several  previous  and  ongoing 
efforts  to  build  systems  in  VLSI  based  on  these  diflferent  methods. 

2.1  Basic  Equations 

In  order  to  express  the  motion  equations  in  terms  of  image  plane  coordinates,  it  is  first 
necessary  to  formulate  the  relation  between  the  3-D  coordinates  of  objects  in  the  scene  and 
their  2-D  projections.  Once  the  motion  is  known,  the  projection  equations  can  be  inverted 
to  yield  the  3-D  coordinates  of  the  features  which  were  matched  in  the  images. 

The  exact  projective  relation  for  real  imaging  systems  is  in  general  nonlinear.  However, 
it  can  usually  be  well  approximated  by  a  linear  model.  If  greater  precision  is  needed, 
nonlinearities  can  be  accounted  for  either  by  adding  higher  order  terms  or  by  pre-warping 
the  image  plane  coordinates  to  fit  the  linear  model.  Of  the  two  choices  most  commonly  used 
for  the  basic  linear  relation — orthographic  and  perspective — only  perspective  projection  can 
meet  the  requirements  of  the  present  system.  Orthographic  projection,  which  approximates 
rays  from  the  image  plane  to  objects  in  the  scene  as  parallel  straight  lines,  is  the  limiting  case 
of  perspective  projection  as  the  field  of  view  goes  to  zero  or  as  the  the  distance  to  objects  in 
the  scene  goes  to  infinity.  Orthographic  projection  has  often  been  used  in  machine  vision  for 
the  recovery  of  structure  and  motion  [13],  [14]  because  it  simplifies  the  motion  equations. 
In  the  orthographic  model  the  projected  coordinates  are  independent  of  depth  and  are 
therefore  uncoupled.  For  the  same  reason,  however,  it  is  impossible  to  uniquely  determine 
motion  from  two  orthographic  views  [15]. 

2.1.1  Perspective  geometry 

Under  the  assumption  of  perfect  perspective  projection,  such  as  would  be  obtained  with 
an  ideal  pinhole  camera,  we  define  the  camera  coordinate  system  as  shown  in  Figure  2-1 
with  origin  at  the  center  of  projection.  The  image  plane  is  perpendicular  to  the  s  axis  and  is 
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Figure  2-1:  Geometry  of  perspective  projection. 


located  at  a  distance  /,  which  is  the  ejfective  focal  lengthy  from  the  origin.  In  a  real  camera, 
of  course,  the  image  sensor  is  located  behind  the  optical  center  of  projection  at  =  — /.  It 
is  customary,  however,  to  represent  the  image  plane  as  shown  in  the  diagram  in  order  to 
avoid  the  use  of  negative  coordinates.  The  point  where  the  I,  or  optical,  axis  pierces  the 
image  plane  is  referred  to  as  the  principal  point. 

Let  p  =  (X,y,  represent  the  position  vector  in  the  camera  coordinate  system  of  a 
point  P  in  the  scene  and  let  p  =  {XIZCT IZMV  denote  the  2-dimensional  homogeneous 
representation  of  p.  Two  world  points  Pi  and  Pj  are  projectively  equivalent  with  respect 
to  a  plane  perpendicular  to  the  ^-axis  if  and  only  if  p;  =  pj.  For  the  world  point  P  to 
be  imaged  on  the  plane  z  =  /  at  P',  whose  coordinates  are  (x,y,f),  P  and  P'  must  be 
projectively  equivalent.  In  other  words 


X  _  X 

7  “  y’ 


and, 


1 

/ 


F 


(2.1) 


Since  image  irradiance,  or  brightness,  is  always  sampled  discretely  by  an  array  of  pho¬ 
tosensors,  it  is  convenient  to  define  a  secondary  set  of  coordinates,  [nix,  my)  on  the  array 
of  picture  cells  such  that  the  centers  of  each  pixel  are  located  at  integer  values  of  and 
my.  The  vector  m  =  (mx,my,l)'^  is  related  by  a.  linear  transformation  matrix  to  the 
2-D  homogeneous  representation,  in  the  camera  coordinate  system,  of  all  points  which  are 
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projectively  equivalent  to  (.t,?/,/) 


m  =  KcP 


(2.2) 


Under  the  conditions  illustrated  in  Figure  2-1,  Kc  has  the  special  form 

II  d 

Kc  =  0  f/Sy  niyo 

I  0  0  1  j 


(2.3) 


where  /  is  the  effective  focal  length;  Sx  and  Sy  are  the  physical  distances,  measured  in  the 
same  units  as  /,  between  pixel  centers  along  the  orthogonal  x-  and  y-axes;  and  {mxo,myo) 
is  the  location,  in  pixel  coordinates,  of  the  principal  point.  The  matrix  Kc  is  referred  to 
as  the  internal  camera  calibration  matrix  and  must  be  known  before  scene  structure  and 
camera  motion  can  be  recovered  from  the  apparent  motion  of  brightness  patterns  projected 
onto  the  image  plane. 

In  real  devices,  Kc  seldom  has  exactly  the  form  of  (2.3)  due  to  factors  such  as  the 
misalignment  of  the  image  sensor  and  spherical  aberrations  in  the  lens.  Finding  the  appro¬ 
priate  transformation  is  a  difficult  problem,  and  consequently  numerous  methods  have  been 
developed,  involving  varying  degrees  of  complexity,  to  determine  internal  calibration  [16], 
[17],  [18],  [19],  [20],  [21].  Discussing  these  methods,  however,  goes  well  beyond  the  scope  of 
this  thesis,  and  so  it  will  be  assumed  for  present  purposes  that  the  calibration  is  known. 


2.1.2  The  epipolar  constraint 

Suppose  we  have  two  images  taken  at  two  different  camera  positions,  which  we  will  refer 
to  as  right  and  left.  Let  p,.  and  p/  denote  the  position  vectors  with  respect  to  the  right 
and  left  coordinate  systems  to  a  fixed  point  P  in  the  environment,  p^  =  (AT,Tr,^r)^  and 
p;  =  (Xr,Yf,Zr)^.  Assuming  a  fixed  environment  so  that  rigid  body  motion  is  applicable, 
Pr  and  p;  are  related  by 

p,.  =  Rp/  +  b  (2.4) 

where  R  denotes  an  orthonormal  rotation  matrix,  and  b  is  the  baseline  vector  connecting 
the  origins  of  the  two  systems.  A  necessary  condition  for  the  vectors  p^  and  p;  to  intersect 
at  P  is  that  they  be  coplanar  with  the  baseline,  b,  or  equivalently,  that  the  triple  product 
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of  the  three  vectors  vanish 

p,.  •  (b  X  Rp/)  =  0  (2.5) 

Given  R,  b,  and  the  coordinates  (x(,  ye,  /)  of  the  point  Pj,,  which  is  the  projection  of  P  in 
the  left  image,  equation  (2.5)  defines  a  line  in  the  right  image  upon  which  the  corresponding 
point  P'.  at  {xr,yr,f)  must  lie.  This  line,  known  as  the  epipolar  line,  is  the  intersection  of 
the  image  plane  with  the  plane  containing  the  point  P  and  the  baseline  b.  The  position  of 
P/  on  the  epipolar  line  is  determined  by  the  depth,  or  ^-coordinate,  of  P  in  the  left  camera 
system.  To  see  this,  define  the  variables  p,  and  d)  to  represent  the  components  of  the 
rotated  homogeneous  vector  Rp^ 

Rpr  = 

Then  equation  (2.4)  can  be  expressed  in  component  form  as 

^  A',  \  \  ^ 

Yr  =  Ze  p  +  by  (2.7) 

^  Zr  J  \(l>  )  \bz  j 

The  projection  of  P  onto  the  right  image  is  found  from 


Xj.  _  Ze^  +  hx 

X  ^  Ze<p+b, 


(2.8) 


and 


Yr  _  Zep  +  by 
Zy  Ze4>  +  bz 


(2.9) 


Py  thus  varies  along  the  epipolar  line  between  f(bxlbz,by/b:;)  when  —  0  to  f{^/4>,p/(l>) 
when  Ze  —  oo.  The  first  point,  known  as  the  epipole,  is  independent  of  xe  and  ye,  and  is 


therefore  common  to  all  epipolar  lines.  By  rewriting  (2.4)  as 


P;  =  R^P,.  4-  b' 


(2.10) 


where  b'  =  -R^b,  a  similar  relation  can  be  obtained  for  the  coordinates  of  the  point  P^,  the 
projection  of  P  onto  the  left  image  plane,  in  terms  of  Zy  and  the  components  of  the  rotated 
vector  R^pr.  The  geometry  of  the  epipolar  transformation  is  illustrated  in  Figure  2-2. 
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Figure  2-2:  Epipolar  geometry. 


Binocular  stereopsis  is  based  on  the  fact  that  the  depth  of  objects  in  the  scene  can  be 
recovered  from  their  projections  in  two  images  with  a  known  relative  orientation.  From 
(2.8)  and  (2.9)  it  is  easily  seen  that 

^  _  f^x  _  fby  b^yr 

<f>Xr  -  (pVr  -  fr] 


The  quantity  (f)Xr  —  is  known  as  the  horizontal  disparity,  du,  and  the  quantity  -  fp 
as  the  vertical  disparity,  dy-  If  R  =  I  and  b  =  (|b|,  0, 0),  then  ^  =  x^/ f,  <!>  —  \  and  equation 
(2.11)  reduces  to  the  familiar  parallel  geometry  case  in  which 


Zf  —  Zi  — 


(2.12) 


and  for  which  the  vertical  disparity  is  necessarily  zero. 


2.2  Computing  Motion  by  Matching  Features 

In  this  section,  we  will  examine  only  those  methods  which  compute  motion  from  point 
correspondences.  Although  algorithms  using  higher  level  features,  such  as  lines  and  planes 
have  been  proposed  ([22],  [23],  [24],  [25]),  these  usually  require  more  than  two  views  and 
also  are  not  as  practical  for  hardware  implementation. 

To  compute  camera  motion,  or  relative  orientation,  we  need  to  find  the  rotation  and 
baseline  vector  that  best  describe  the  transformation  between  the  right  and  left  camera 
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systems.  We  assume  that  we  ha.ve  a  set  of  N  pairs,  {(ri, ^i)},  i  =  1, . . N ,  where  r,;  and  £,■ 
are  the  position  vectors  of  the  points  P/.  and  P/.  which  are  projectively  equivalent,  in  the 
right  and  left  systems  respectively,  to  the  same  world  point  P. 

The  fundamental  equation  for  computing  the  coordinate  transformation  between  the  two 
camera  systems  is  the  coplanarity  constraint  given  by  equation  (2.5).  Since  this  equation  is 
homogeneous,  it  can  be  written  in  terms  of  r;  and  ii  as 

r,-(bxR^-)  =  0  (2.13) 

Equation  (2.13)  is  unaffected  by  the  lengths  of  any  of  the  vectors  r,-,  or  b.  We  usually 
set  |b|  =  1  so  that  the  baseline  length  becomes  the  unit  of  measure  for  all  distances  in  the 
scene.  The  vectors  and  li  may  be  set  equal  to  the  homogeneous  vectors  p,-,-  and  pc,- 

r,-  =  p,.,-  and,  =  p<',-  (2.14) 


or  may  also  be  assigned  unit  length: 


y.  _  and  p.  -  P^» 

IPril  |P^d 


(2.15) 


The  second  choice  is  often  referred  to  as  spherical  projection. 


2.2.1  Representing  rotation 

There  are  several  ways  to  represent  rotation,  including  orthonormal  matrices  as  in  (2.13). 
The  matrix  form  is  not  always  the  best  choice,  however,  as  it  requires  nine  coefficients,  even 
though  there  are  only  three  degrees  of  freedom.  A  rotation  is  completely  specified  by  the 
pair  (0,di),  where  6  represents  the  angle  of  rotation  about  the  axis  lo  =  (u;^.,^;^,  with 
|t3|  =  1.  The  relation  between  the  orthonormal  matrix  R  and  (O^a)  is  given  by 

^  cos6' +  w2(l  -  cos6>)  -  cos^)  -  w^sin6»  Wj,t^j(l  -  cos  6»)  +  sin  '' 

R=  Wj:<^;/(1  —  cos  (?)  +  sin  6*  cos0  +  w^f  l  —  cos  0)  a;j^w^(l  —  cos  6*)  —  Wj,  sin  0 

,  t<;j;a;2(l  —  cos  (?)  —  Wy  sin  6*  a/'j,W2(l  —  cos  ^)  +  sin  0  cos  0  +  a;?(l  —  cos  6*)  y 

(2.16) 

which  can  also  be  expressed  as 


R  =  cos  +  (1  —  cos  6)CjCj^  +  sin  6Ulx 


(2.17) 
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0  —iOz 

Si^x  =  0  -UJx 


-LJy  0 


(2.18) 


Equation  (2.17)  leads  directly  to  Rodrigues’  well-known  formula  for  the  rotation  of  a  vector 


Hi  =  cos  -f  (1  —  cos  0)((i>  •  .£)cj -f  sin  X  £ 
=  -f  sin  X  -f  ( 1  —  cos  6)0  x  [0  x  £) 


(2.19) 


A  frequently  useful  and  compact  representation  of  rotation  is  the  unit  quaternion. 
Quaternions  are  vectors  in  which  may  be  thought  of  as  the  composition  of  a  scalar 
and  ‘vector’  part  [4]. 

a=(ao,a)  (2.20) 

where  a  =  (a^.,  Uj)^  is  a  vector  in  E^.  An  ordinary  vector  v  in  E®  is  represented  in 
quaternion  form  as 

v  =  (0,v)  (2.21) 

A  unit  quaternion  is  one  whose  magnitude,  defined  as  the  square  root  of  its  dot  product 
with  itself,  is  unity. 

q-q=l  (2.22) 

Unlike  vectors  in  E^,  quaternions  are  endowed  with  special  operations  of  multiplication  and 
conjugation,  and  thus  form  the  basis  of  a  complete  algebra.  The  fundamental  operations 
and  identities  of  quaternion  algebra  are  summarized  in  Appendix  A. 

The  usefulness  of  quaternions  lies  in  the  simplicity  with  which  rotation  about  an  arbi¬ 
trary  axis  can  be  represented.  Every  unit  quaternion  may  be  written  as 


(2.23) 


q  =  l  cos  -  ,  w  sm  - 


and  the  rotation  of  the  vector  v  by  an  angle  6  about  Q  by 


v'  =  qv  q* 


(2.24) 
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where  q*  represents  the  conjugate  of  q. 

In  the  following  discussions  I  will  alternate  between  these  different  representations,  to 
use  whichever  form  is  best  suited  to  the  problem  at  hand. 

2.2.2  Exact  solution  of  the  motion  equations 

There  are  five  unknown  parameters  in  equation  (2.13),  two  for  the  direction  of  the 
baseline  and  three  for  the  rotation.  It  has  long  been  known  that  relative  orientation  can 
be  determined  from  a  minimum  of  five  points,  as  long  as  these  do  not  lie  on  a  degenerate 
surface.  Due  to  the  rotational  component,  however,  the  equations  are  nonlinear  and  must 
be  solved  by  iterative  methods.  Furthermore,  the  five-point  formulation  admits  multiple 
solutions^  [7],  [4]. 

Thompson  [26]  first  showed  how  the  coplanarity  conditions  could  be  formulated  as  a  set 
of  nine  homogeneous  linear  equations,  and  Longuet-Higgins  [27]  proposed  an  algorithm  to 
derive  the  baseline  vector  and  rotation  matrix  from  the  solution  to  the  equations  obtained 
from  eight  point  correspondences.  This  algorithm  is  summarized  as  follows: 

The  first  step  is  to  rewrite  the  coplanarity  constraint  (2.13)  as 

Tj  •  (b  X  Rfi)  =  -Rfj  •  (b  X  Ti) 

=  (2.25) 

where 


We  define  the  matrix  E  as 

E  =  R^Bx  (2.27) 

Waugeras  and  Maybank  [7]  first  proved  that  there  are  at  most  10  solutions  for  the  camera  motion  given 
5  correspondences,  thereby  correcting  a  longstanding  error  by  Kruppa  [5]  wflio  had  thought  there  were  at 
most  11. 
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and  order  the  components  of  E  as 


Cl 

64 

67 

62 

65 

68 

63 

66 

69 

(2.28) 


Let  a*  denote  the  9x1  vector  formed  from  the  products  of  the  components  of  r,;  and 


7‘  ■/  ' 


(2.29) 


and  let  e  denote  the  9x1  vector  of  the  elements  of  E.  Then  the  coplanarity  constraint  for 
each  pair  of  rays  results  in  an  equation  of  the  form 


aj  e  =  0 


(2.30) 


Eight  correspondences  result  in  eight  equations  which  can  be  solved  to  within  a  scale  fac¬ 
tor  for  the  elements  of  E.  Given  E,  the  baseline  vector  is  identified  as  the  eigenvector 
corresponding  to  the  zero  eigenvalue  of  E^E 


E'^Eb  =  0 


(2.31) 


as  can  be  seen  from  the  fact  that 


= 

=  I-bb^ 


(2.32) 
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The  rotation  matrix  R  is  found  from 

R  =  BxET-Cof(ET)T  (2.33) 

where  Cof(E^)  is  the  matrix  of  cofactors  of  E^. 

2.2.3  Least-squares  methods 

If  there  is  no  error  in  the  data,  the  Longuet-Higgins  algorithm  will  give  a  unique  solution 
for  the  motion^  except  for  certain  configurations  of  points  which  lie  on  special  surfaces  [8], 
[28].  It  is  extremely  difficult,  however,  to  obtain  error-free  data,  particularly  if  the  corre¬ 
spondences  are  determined  by  an  automatic  procedure.  It  turns  out  that  the  8-point  linear 
algorithm  is  extremely  unstable  in  the  presence  of  noise,  due  largely  to  the  fact  that  the 
equations  (2.30)  do  not  take  into  account  dependencies  between  the  elements  of  E,  and 
hence  their  solution  cannot  be  decomposed  into  the  product  form  of  equation  (2.27). 

Even  when  nonlinear  methods  are  used  to  solve  the  coplanarity  constraint  equations, 
the  solution  is  very  sensitive  to  noise  when  few  correspondences  are  used  [29].  With  error 
in  the  data,  the  ray  pairs  are  not  exactly  coplanar  and  equation  (2.13)  should  be  written  as 

R£,  •  (r-i  X  b)  =  A,-  (2.34) 

Instead  of  trying  to  solve  the  constraint  equations  exactly,  it  is  better  to  find  the  solution 
that  minimizes  the  error  norm 

5  =  X;a?  (2.35) 

i=l 

A  somewhat  improved  approach  over  the  8-point  algorithm  was  proposed  by  Weng  et 
al.  [30]  based  on  a  modification  of  a  method  originally  presented  by  Tsai  and  Huang  [28]. 


^There  is  an  intrinsic  fourfold  ambiguity  to  every  solution;  however,  these  are  all  counted  as  one.  This 
ambiguity  will  be  discussed  in  more  detail  in  Chapter  7. 
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They  defined  an  N  X  9  matrix  A  as 


(2.36) 


such  that  S  =  |Aep.  The  vector  e  which  minimizes  S  is  the  eigenvector  of  A^A  with 
the  smallest  eigenvalue.  The  baseline  direction  and  rotation  are  derived  from  the  resulting 
matrix  E  such  that  b  minimizes  lE'^Eb]  and  R  is  the  orthonormal  rotation  matrix  that 
minimizes  |E  —  |. 

This  method,  however,  also  neglects  the  dependencies  between  the  elements  of  E,  and 
consequently  is  still  very  sensitive  to  errors  in  the  data.  The  matrix  formed  from  the 
elements  of  the  vector  e  that  minimizes  |Ae|  is  not  necessarily  close  to  the  product  of  the 
matrices  R  and  Bx  which  correspond  to  the  true  motion. 

Several  researchers  have  pointed  out  the  problems  of  computing  motion  by  unconstrained 
minimization  of  the  error  [12],  [3].  Horn  [3]  proposed  the  most  general  algorithm  to  solve 
the  direct  nonlinear  constrained  optimization  problem  iteratively.  This  method  was  later 
revised  and  reformulated  in  [4]  using  unit  quaternions. 

The  vectors  Tj-,  and  b  are  given  in  quaternion  form  by 


Ti  =  (0,r),  ii  =  (0,-f),  and,  b  =  (0,b) 


(2.37) 


while  that  of  £'i  =  Kii  is  given  by 


i'i  =  qC  q* 


(2.38) 


Using  the  identity  (A. 10)  given  in  Appendix  A,  the  triple  product  Xi  (2.34)  can  be 
written  as 


=  fib  •  qli  q* 
=  fibq-q^i 
=  fid-q£i 


(2.39) 


where  d  =  bq. 
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In  the  latter  version  of  Horn’s  algorithm,  S,  the  sum  of  squared  errors,  is  written  as  a 
first  order  perturbation  about  a  given  d  and  q.  The  idea  is  to  find  the  incremental  changes 
(^d  and  ^q  which  minimize  the  linearized  equation  subject  to  the  constraints 

q  ■  q  =  1,  d  •  d  =  1,  and,  q  •  d  =  0  (2.40) 

The  updated  vectors  q  +  ^q  and  d  +  ^d  must  also  satisfy  these  conditions  and,  neglecting 
second  order  terms,  this  results  in  the  incremental  constraints 


q  •  ^q  =  0,  d  •  M  =  0,  and,  q  •  ^d  +  d  •  ^q  =  0  (2.41) 


Differentiating  the  constrained  objective  function  with  respect  to  ^q,  ^d,  and  the  Lagrange 
multipliers  A,  /r,  v  associated  with  each  of  the  constraints  (2.41)  and  setting  the  result  to 
zero  results  in  a  linear  system  of  equations  of  the  form 


^d 
A 
P 

\  / 


=  h 


(2.42) 


where  the  matrix  J  and  the  vector  h  are  both  known,  given  the  current  value  of  q  and  d 
(see  [4]  for  details).  Equation  (2.42)  can  thus  be  solved  for  the  11  unknowns,  which  are  the 
four  components  each  of  ^q  and  ^d  and  the  three  Lagrange  multipliers.  After  updating  q  and 
d  with  the  new  increments  ^q  and  ^d,  the  procedure  can  be  repeated  until  the  percentage 
change  in  the  total  error  falls  below  some  limit.  This  algorithm  has  been  shown  to  be  very 
accurate  and  efficient  in  most  cases  for  estima.ting  motion,  even  with  noisy  correspondence 
data,  as  long  as  there  are  a  sufficiently  large  number  of  matches.  As  presented,  however,  it 
is  too  complex  to  be  implemented  efficiently  on  a  simple  processor,  given  the  need  to  solve 
an  11x11  system  of  equations  at  each  iteration.  We  will  present  a  simplified  adaptation  of 
this  algorithm  in  Chapter  7. 
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2.3  Correspondenceless  Methods 

If  the  displacements  in  the  image  are  small,  they  can  be  approximated  by  the  time 
derivatives  of  the  position  vectors  to  points  in  the  scene.  For  small  0,  cosO  ~  1,  sin  0  ~  0, 
and  equation  (2.17)  reduces  to 

R  ss  I  +  0n^  (2.43) 

The  equation  of  rigid  body  motion  (2.4)  then  becomes 

p,.  =  Rp/  +  b 

=  p;  +  0{ui  X  p/)  +  b  (2.44) 
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^  f^y  +  hi  _  ^  +  f)  +  u,xf)  (2.51) 

first  derived  by  Longuet-Higgins  a.nd  Prazdny  [31]. 

If  it  is  assumed  that  the  brightness  £  of  a  point  in  the  image  does  not  change  as  the 
point  moves,  then  the  total  derivative  of  brightness  with  time  must  be  zero,  that  is, 

(IE  dE  dx  dE  dy  dE 

dt  ^  dx  dt  dy  dt  dt 

=  E^u  +  EyV  +  Et  (2.52) 

Equation  (2.52)  is  known  as  the  brightness  change  constraint  equation. 

There  are  two  approaches  to  using  these  equations.  The  first,  and  earliest  proposed,  is 
to  compute  the  optical  flow  over  the  entire  image  and  to  invert  (2.50)  and  (2.51)  to  find  the 
global  motion  and  depth  at  each  pixel.  The  second,  known  as  the  direct  approach,  skips  the 
computation  of  the  optical  flow  and  uses  only  the  constant  brightness  assumption  combined 
with  the  incremental  rigid  body  equations.  Neither  approach  requires  finding  explicit  point 
correspondences . 

2.3.1  Optical  flow 

Horn  and  Schunck  developed  the  first  algorithm  for  determining  optical  flow  from  local 
image  brightness  derivatives  [32]  based  on  minimizing  the  error  in  the  brightness  change 
constraint  equation 

€(,  =  ExU  +  EyV  +  Et  (2.53) 

Since  there  are  two  unknowns  at  each  pixel,  the  constant  brightness  assumption  is  not 
sufficient  to  determine  u  and  v  uniquely  and  a  second  constraint  is  required.  Horn  and 
Schunck  chose  the  smoothness  of  the  optical  flow  and  added  a  second  error  term 

ef  =  |Vu|2  + |Vup  (2.54) 

The  total  error  to  be  minimized  is  therefore 


eL  =  I J  {4  +  ^^4)  dx  dy 


(2.55) 


CHAPTER  2.  METHODS  FOR  COMPUTING  MOTION  AND  STRUCTURE 


39 


where  is  a  penalty  term  that  weights  the  relative  importance  of  the  two  constraints. 
The  functions  u  and  v  which  minimize  for  a  given  A  can  be  found  using  the  calculus  of 
variations. 

One  problem  with  computing  optical  flow  by  applying  a  smoothness  constraint  is  that 
the  flow  is  not  smooth  at  boundaries  between  objects  at  different  depths.  The  global 
optimization  procedure  causes  errors  generated  at  depth  discontinuities  to  propagate  to 
neighboring  regions  [33].  Segmenting  the  optical  flow  at  depth  discontinuities  would  appear 
to  be  the  solution  to  this  problem  except  that  one  does  not  know  a  priori  where  they  are. 
Murray  and  Buxton  [34]  proposed  incorporating  discontinuities  by  adding  line  processes 
to  the  objective  function,  using  an  idea  originated  by  Geman  and  Geman  for  segmenting 
gray-scale  images  by  simulated  annealing  [35].  The  resulting  optimization  problem  is  non- 
convex,  however,  and  requires  special  procedures  to  converge  to  a  global  minimum  energy 
state. 

Once  the  optical  flow  is  determined  it  is  necessary  to  solve  equations  (2.50)  and  (2.51)  to 
And  motion  and  depth.  As  was  the  case  for  the  explicit  methods,  absolute  distances  cannot 
be  recovered  since  scaling  Z  and  b  by  the  same  factor  has  no  effect  on  u  and  v.  Longuet- 
Higgins  and  Prazdny  [31]  showed  how  motion  and  depth  parameters  could  be  determined 
from  the  first  and  second  derivatives  of  the  optical  flow  after  first  computing  the  location 
of  the  epipole.  Heeger  et  al.  [36]  proposed  a  method  to  recover  the  motion  by  applying 
rotation  insensitive  center-surround  operators  that  allow  the  translational  and  rotational 
components  of  the  motion  to  be  determined  separately.  Ambiguities  in  interpreting  the 
optical  flow  in  the  case  of  special  surfaces  have  been  analyzed  in  [37],  [38],  [39],  and  [40]. 

2.3.2  Direct  methods 

The  method  of  Horn  and  Schunck,  or  one  of  its  variations,  requires  a  great  deal  of 
computation  to  determine  the  optical  flow — which  is  only  an  intermediate  step  in  obtain¬ 
ing  the  actual  parameters  of  interest.  The  direct  approach  of  Horn  and  Weldon  [41]  and 
Negahdaripour  and  Horn  [42]  avoids  computing  optical  flow  by  substituting  u  and  v  from 
equations  (2.50)  and  (2.51)  directly  into  (2.52).  The  brightness  change  constraint  equation 
is  thus  expressed  as 

(v  •  Q))6  A  =  -Et 


(2.56) 
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where 

(  \ 

s  =  -fEy  (2.57) 

^  xEx  4“  yEy  ! 

and 

^  jEyTy{xExEyEy)lj 

v=  -^Ex-x{xExEyEy)lj  (2.58) 

y  yEx  xEy  j 

Note  that  the  vectors  s  and  v  are  entirely  computable  from  measurements  in  the  image. 

Assuming  the  image  contains  N  pixels,  there  are  N  b  unknowns  in  equation  (2.56): 
the  five  independent  parameters  of  b  and  dO  (recall  that  b  is  a  unit  vector  due  to  the  scale 

factor  ambiguity),  and  the  N  depth  values  Z.  Since  there  is  only  one  equation  (2.56)  for  each 

pixel,  the  problem  is  mildly  underconstrained.  Given  two  images  it  can  be  solved  only  for  a 
few  special  cases  in  which  either  the  motion  or  the  surface  structure  is  restricted.  With  more 
than  two  views  of  the  same  scene,  however,  the  problem  is  no  longer  underconstrained  [11]. 
It  should  be  noted  that  it  is  never  required  to  incorporate  the  assumption  that  the  optical 
flow  is  smooth,  and  hence  the  problems  associated  with  discontinuities  in  the  flow  are 
avoided. 

Several  methods  have  been  developed  to  solve  the  special  cases  where  the  problem  is  not 
under  constrained  for  two  views.  Three  of  these  were  developed  by  Negahdaripour  and  Horn 
who  gave  a  closed  form  solution  for  motion  with  respect  to  a  planar  surface  [4.3];  showed  how 
the  constraint  that  depth  must  be  positive  could  be  used  to  recover  translational  motion 
when  the  rotation  is  zero,  or  is  known  [44];  and  derived  a  method  for  locating  the  focus  of 
expansion  [4,5].  Taalebinezhaad  [46],  [47]  showed  how  motion  and  depth  could  be  determined 
in  the  general  case  by  fixating  on  a  single  point  in  the  image.  He  essentially  demonstrated 
that  obtaining  one  point  correspondence  would  provide  enough  information  to  enable  the 
general  problem  to  be  solved. 

2.3.3  Limitations  of  correspondenceless  methods 

Methods  for  computing  motion  and  depth  from  the  local  spatio-temporal  derivatives 
of  image  brightness  must  rely  on  specific  assumptions  in  order  to  work.  The  most  impor¬ 
tant  of  these,  on  which  all  of  the  methods  just  described  are  based,  is  that  brightness  is 


CHAPTER  2.  METHODS  FOR  COMPUTING  MOTION  AND  STRUCTURE 


41 


constant  (2.52).  Verri  and  Poggio  [48]  criticized  differential  methods  on  the  grounds  that 
brightness  constancy  is  often  violated.  Their  arguments,  however,  were  based  on  consid¬ 
ering  shading  effects  which  are  important  only  for  specnlar  surfaces  or  when  the  motion 
is  large  enough  to  significantly  affect  surface  orientation.  Furthermore,  these  effects  domi¬ 
nate  only  when  the  magnitude  of  the  spatial  brightness  gradient  is  small.  There  are  clearly 
cases,  such  as  the  rotating  uniform  sphere  or  the  moving  point  light  sonrce,  as  pointed  out 
by  Horn  [49]  and  others,  in  which  the  optical  flow  and  the  motion  field  are  different.  In 
areas  of  the  image  where  the  brightness  derivatives  are  small,  it  is  difficult  to  constrain  the 
motion  or  to  determine  depth.  However,  this  problem  is  not  specific  to  differential  methods. 
Gennert  and  Negahdaripour  [50]  investigated  the  use  of  a  linear  transformation  model  to 
account  for  brightness  changes  due  to  shading  effects  on  lightly  textured  surfaces.  Their 
method  was  applied  only  to  computing  optical  flow  and  involved  modifying  the  objective 
function  (2.55)  to  add  new  constraints.  Direct  methods  do  not  lend  themselves  as  easily 
to  relaxing  the  brightness  constancy  assumption.  With  these  it  is  simpler  to  ignore  areas 
where  the  spatial  derivatives  are  small. 

One  of  the  more  important  assumptions  underlying  differential  methods  is  that  the 
interframe  motion  must  be  small  so  that  the  approximations  (2.43)-(2.45)  will  be  valid, 
and  so  that  the  spatial  and  temporal  sampling  rates  will  not  violate  the  Nyquist  criterion. 

It  is  useful  to  perform  some  sample  calculations  to  see  what  is  meant  by  “small”.  The 
approximations  sin^  ~  6,  cos  6  ~  1  are  accurate  to  within  1.5%  to  about  10°  of  rotation. 
Approximations  (2.45)  and  (2.47)  which  express  the  derivatives  of  the  position  vector  as 
the  difference  between  the  left  and  right  rays,  and  which  incorporate  the  approximation 
R  I  -f  00x5  are  thus  reasonable  as  long  as  the  velocity  of  the  point  in  the  scene  is 
constant  between  frames  and  6  <  10°.  These  conditions  should  not  be  difficult  to  achieve 
with  video-rate  motion  sequences.  The  angular  restriction  may  rule  out  some  binocular 
stereo  arrangements,  however. 

The  primary  concern  is  thus  not  the  validity  of  the  incremental  optical  flow  equations, 
but  whether  the  sampling  rates  are  high  enough  to  avoid  aliasing.  The  Nyquist  criterion 
which  bounds  the  maximnm  rate  at  which  an  image  sequence  can  be  sampled  in  space  and 
time  can  be  derived  as  follows. 

The  constant  brightness  assumption  requires  that 


+  EyV  +  Et  =  0 


(2.59) 
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If  the  Fourier  transform  of  E(x,y,t)  is  given  by 

E{x,yD)  (2-60) 

then 

E^u  +  EyV  +  Ei  <1=^  ji(u  +  i]v  +  //)^(^,  ?/,  7/)  (2.61) 

If  the  constant  brightness  assumption  is  valid,  then  by  linearity  of  the  Fourier  transform, 
either 

j(^u  +  yv  +  v)  —  0  (2.62) 

or, 

=  0  (2.63) 

for  all  77,  u. 

If  E{x,  y)  is  bandlimited  so  that  |^|  <  Qx  and  \i]\  <  Qy,  then  (2.62)  requires  that  \i^\  <  fit 
where 

fit  =  (2.64) 

Note  that  fix,  Dy,  and  fit  represent  angular  frequencies  and  should  not  be  confused  with 
the  matrix  Ox. 

To  avoid  aliasing,  the  temporal  sampling  rate  r  must  satisfy  r  >  2flt.  However,  r  often 
cannot  be  changed,  for  instance  in  video  sequences  where  images  are  produced  at  a  rate 
of  30  frames/sec. ^  Although  it  is  possible  to  design  video  cameras  to  operate  at  higher 
rates,  other  factors,  such  as  the  amount  of  available  light  or  interframe  processing  time 
requirements,  may  limit  how  far  one  can  go. 

For  a  given  r,  the  Nyquist  criterion  thus  imposes  a  restriction  on  the  maximum  im¬ 
age  plane  displacement  which  can  be  tolerated.  If  the  spatial  bandwidths  Dx  and  Dy  are 
approximately  the  same,  so  tha.t  we  can  set  fix  =  fly  =  fls,  then 

T  >  2flt  =  2$2s(|7t|  +  |u|)inax  (2.65) 

^According  to  the  American  NTSC  sta.ndard.  Other  countries  outside  North  America  and  Japan  use  the 
PAL  and  SECAM  standards  which  produce  25  frames/sec. 
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or, 

(|«|  +  |r^|Ux<^  (2.66) 

The  quantities  n/r  and  n/r  have  the  units  of  pixels /frame,  while  Dg  has  units  of  1 /pixel. 
Since  the  images  are  spatially  sampled,  it  is  also  true  that  0  <  |fls|  <  1/2. 

If  the  image  is  highly  textured  and  contains  signihcant  energy  in  frequencies  near  the 
upper  hmit,  the  maximum  tolerable  displacement  will  be  around  1  to  2  pixels  per  frame.  In 
most  cases,  disparities  in  binocular  stereo  pairs  will  be  greater  than  this.  It  is  worthwhile  to 
compute  some  typical  displacements  in  images  generated  by  a  moving  camera.  From  (2.50) 
and  (2.51)  we  can  find  the  optical  flow  for  the  following  special  cases: 

1.  Pure  translation  along  the  .t  direction 

(2.67) 

2.  Pure  translation  along  the  z  direction 

(u,t;)=|(a-,2/)  (2.68) 


3.  Pure  rotation  of  6  about  Q>  =  y 

{u,v)=  j{x‘^  +  f,xy)  (2.69) 

In  normal  imaging  systems  the  effective  focal  length  /  is  several  hundred  times  longer  than 
the  interpixel  spacing.  For  the  purpose  of  calculating  displacements,  let  /  =  200  pixels.  For 
the  first  case,  pure  translation  in  the  x  direction,  suppose  the  camera  is  moving  at  30mph 
(48  km/hr)  and  viewing  an  object  at  a.  distance  of  10m  while  generating  a  video  sequence 
at  30  frames/sec.  This  could  be  the  situation  of  a  camera  attached  to  a  car  door  viewing 
the  side  of  the  road.  In  1/30  sec,  the  camera,  has  moved  .444m  in  the  x  direction.  We  thus 
find 

444 

{u,v)  =  -1^(200,0)  =  (-8.9,0)  (2.70) 

which  is  considerably  larger  than  the  maximum  allowed  displacement. 

In  the  second  case,  pure  translation  along  the  i  direction,  we  can  identify  the  quantity 
bz/Z  as  1/r,  where  T  is  the  time-to-impact  of  the  object  being  viewed.  Suppose  T  =  10 
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sec  and  the  frame  rate  is  still  1/30  sec,  then 

)  =  ^(•'*^.  1/)  (2-71) 

For  most,  or  all,  of  the  image,  he.,  x,y  <  300,  the  displacement  will  be  less  than  1  pixel, 
and  hence  there  should  be  no  problem  in  applying  differential  methods  to  compute  the  time 
to  crash.  (For  T  <  10  sec,  the  displacements  may  become  too  large,  but  at  that  point  it 
may  be  too  late  to  care.) 

In  the  last  case,  pure  rotation  about  y,  the  motion  depends  only  on  0  and  the  position 
in  the  image.  The  smallest  displacement  occurs  at  the  principal  point,  x  —  0,  y  —  0,  where 

{u,v)  =  0if,O}  (2.72) 


Every  1°  (.0175  radians)  of  rotation  corresponds  to  a  displacement  of  3.5  pixels,  with  /  =  200 
as  before.  At  1/30  sec,  this  corresponds  to  30”  per  second,  which  is  easily  exceeded  by 
ordinary  vibrations  from  moving  the  camera. 

When  the  displacements  are  too  large  for  the  given  frame  rate  and  sensor  dimensions, 
the  situation  can  be  remedied  by  low-pass  filtering  the  image,  or  equivalently,  by  reducing 
the  spatial  sampling  rate.  Rewriting  (2.66)  as 


Ds  < 


T 

2(|m|  -f-  |n|)max 


(2.73) 


gives  the  maximum  bandwidth  for  a  given  sampling  rate  and  maximum  image  plane  dis¬ 
placement. 

Unfortunately,  it  is  not  possible  to  know  in  advance  what  (|-n|  -|-  |n|)max  will  be.  If 
the  resolution  is  set  lower  than  necessary,  useful  information  is  lost,  while  if  it  is  set  too 
high,  aliasing  will  occur.  More  insidiously,  one  cannot  determine  from  the  local  brightness 
gradients  alone  if  aliasing  has  occurred.  One  solution  proposed  by  Anandan  [51]  is  to 
perform  motion  estimation  at  multiple  scales  by  separating  the  image  into  a  hierarchical 
pyramid  structure  in  which  each  level  represents  a  different  spatial  bandwidth  and  sampling 
rate.  Information  can  thus  propagate  from  coarse  to  fine  levels  to  determine  the  highest 
resolution  at  which  optical  flow  can  be  computed  from  brightness  gradients. 

The  pyramid  structure  is  not  the  most  practical  option  for  hardware  implementation 
as  it  involves  a  great  deal  of  processing,  and  there  is  not  a  simple  alternative  for  finding 
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the  appropriate  filter  bandwidth  using  only  gradient  information.  By  combining  explicit 
matching  and  differential  methods,  however,  it  should  be  possible  to  devise  a  reliable  sys¬ 
tem  which  takes  advantage  of  the  best  of  each.  Explicit  matching  methods  can  determine 
the  maximum  image  displacements  and  compute  the  motion  parameters,  while  differen¬ 
tial  methods  can  more  easily  obtain  information  that  relates  motion  and  structure  from 
all  parts  of  the  image.  For  example,  given  the  motion  and  assuming  the  image  has  been 
appropriately  lowpass-filtered,  depth  can  be  computed  from  equation  (2.56)  as 


Z  = 


s  •  b 
Et  -b  (v 


(2.74) 


In  this  thesis  we  are  primarily  concerned  with  the  design  of  a  system  to  compute  relative 
motion  from  explicit  point  correspondences  and  will  not  explore  further  the  benefits,  or 
the  details,  of  interfacing  to  other  systems  based  on  different  approaches.  It  should  be 
understood,  however,  that  the  choice  of  an  explicit  strategy  does  not  rule  out  its  use  in 
conjunction  with  other  methods. 


2.4  Motion  Vision  and  VLSI 

The  tremendous  computational  complexity  of  many  of  the  algorithms  for  determining 
motion  and  the  need  to  perform  these  computations  in  real  time  has  led  to  the  design  of 
several  specialized  VLSI  systems.  Analog  processing  has  been  a  major  component  in  most 
of  these  as  it  offers  the  posibility  of  performing  parallel  operations  on  large  amounts  of 
data  with  compact,  low-power  circuits.  In  [52],  Horn  presents  the  theory  and  gives  several 
examples  of  useful  computations  which  can  be  performed  by  analog  networks. 

One  of  the  first  circuits  was  the  correlating  motion  detector  of  Tanner  and  Mead  [53] 
which  was  a  simple  1-D  detection  circuit  that  allowed  a  maximum  motion  of  ±1  pixel. 
A  linear  array  of  photodiodes  converted  incident  light  to  a  1-bit  signal  which  was  com¬ 
pared,  via  a  binary  correlation  circuit,  to  the  stored  signals  from  the  previous  cycle.  The 
peak  correlation  value  at  each  pixel  was  detected  by  mutual  inhibition  among  neighboring 
comparators,  and  the  output  was  summed  on  a  global  bus. 

A  later  design  by  Tanner  implemented  a  2-D  non-clocked  array  of  photosensors  to  com¬ 
pute  optical  flow  by  gradient  descerrt  on  a  feedback  network  [54],  [55,  Chapter  14].  This 
system  was  limited  to  constant  flow,  as  would  arise  from  a  pure  translation  parallel  to  the 
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image  plane.  In  this  case  the  error  function  (2.55)  to  be  minimized  reduces  to 


^tot  ~  J  J  +  EyV  +  Et)  dxdy  (2.75) 

Since  the  flow  is  constant,  the  problem  is  solved  by  taking  derivatives  with  respect  to  u  and 
V  and  setting  these  to  zero.  We  find  that 


du 

d^ut 

dv 


=  J  J  +  EyV  +  Et)  E^dxdij 

=  j  j  {ExV'  +  EyV  +  Et)  E^dxdy 


(2.76) 

(2.77) 


In  the  gradient  descent  approach,  currents  proportional  to  quantities  in  the  integrals  are 
fed  into  a  negative  feedback  loop  which  drives  the  variable  voltages  representing  ii  and  v  to 
values  which  force  the  derivatives  to  zero.  This  circuit  was  designed  to  operate  in  continuous 
time  to  avoid  temporal  aliasing.  The  first  chip  built  was  an  8x8  array  with  processors  at 
each  pixel  to  compute  the  local  multiplications  and  two  global  busses  to  carry  the  values  of 
u  and  V. 

Other  circuits  developed  by  the  Computation  and  Neural  Systems  Program  group  at 
CalTech  are  described  by  Horiuchi  et  al.  in  [56]  where  they  discuss  a  comparative  study 
of  four  experimental  designs  for  1-D  motion  estimation.  Among  these  were  a  1-D  version 
of  Tanner’s  gradient  descent  optical  flow  chip  and  a  fully  digital  circuit  composed  of  off- 
the-shelf  components  to  implement  correlation.  The  other  two  designs  were  a  pulse-coded 
correlation  circuit  (based  on  a  model  of  structures  found  in  the  auditory  system  of  owls) 
which  detects  time  differences  between  neighboring  pulses,  and  a  mixed  analog/digital  sys¬ 
tem  to  track  zero- crossings  of  a  difference  of  Gaussians  (DOG)  filtered  image.  In  their 
results,  they  report  that  the  fully  digital  circuit,  composed  of  a  Fairchild  Linear  CCD  256 
pixel  array  and  a  Harris  RTX2001A  microprocessor,  had  the  best  performance  in  overall 
robustness,  while  the  Tanner  1-D  optical  flow  chip  had  the  least  reliable  performance.  They 
also  reported  difficulties  using  gradient  methods  due  to  the  120Hz  flicker  found  in  ordinary 
room  lights. 

There  has  been  a  great  deal  of  interest,  motivated  by  the  desire  to  reduce  interchip 
communication  requirements,  in  developing  one-chip  circuits  that  incorporate  photosensing 
and  local  processing  at  each  pixel  [57].  With  focal- plane  processing,  however,  the  area  taken 
up  by  the  processing  circuitry  increases  pixel  size  and  therby  reduces  the  maximum  array 
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size  which  can  be  placed  on  the  chip.  Since  technology  limitations  restrict  the  maximum 
die  size  to  about  Icm^,  either  resolution  or  field  of  view  must  be  sacrificed. 

Gottardi  and  Yang  [58]  recently  reported  the  development  of  a  single  chip  1-D  motion 
sensor  in  CCD/CMOS  technology  with  a  115-pixel  linear  image  sensor  and  CCD  charge 
subtraction  circuits  to  perform  correlation.  McQuirk  is  currently  working  on  a  one-chip 
design  for  a  focus  of  expansion  (FOE)  detector  using  the  direct  approach  developed  by 
Negahdaripour  and  Horn  [45].  The  system  architecture  and  results  of  a  preliminary  test  chip 
are  reported  in  [59].  In  order  to  obtain  a  reasonable  array  size  (64x64  in  2p  technology), 
McQuirk  chose  not  to  implement  a  fully  parallel  processor  array,  but  to  time-multiplex 
the  computation  using  one  processor  per  column.  Instead  of  the  continuous-time  gradient 
descent  method  performed  by  Tanner’s  optical  flow  chip,  this  system  computes  a  discrete¬ 
time  iterative  approximation  to  minimize  the  associated  error  function.  Results  from  the 
final  design  of  the  complete  64x64  array  chip  are  not  yet  available. 

The  common  feature  of  the  above  systems  is  that  they  deal  with  only  a  very  limited 
aspect  of  the  problem.  Most  assume  constant  optical  flow,  and  none  allow  for  rotation. 
Given  current  technology  limitations  and  the  complexity  of  computing  general  motion,  it  is 
probably  safe  to  conclude  that  it  cannot  be  done  with  a  single  chip  design  at  any  time  in 
the  forseeable  future.  One  reason  for  designing  simpler  subsystems  is  so  that  they  can  be 
combined  to  solve  more  complex  problems.  As  yet,  however,  no  one  has  built  or  proposed  a 
complete  system  which  includes  the  design  of  specialized  processors  for  computing  general 
motion,  or  relative  orientation,  in  unrestricted  environments. 

This  is  the  problem  which  is  addressed  in  this  thesis. 


Chapter  3 


Matching  Points  in  Images 


Having  chosen  to  build  the  system  based  on  an  explicit  matching  approach,  we  must  now 
determine  the  approach  for  finding  the  point  correspondences.  There  are  several  reasons 
why  obtaining  accurate  and  reliable  point  correspondences  is  a  hard  problem.  One  is  that 
the  same  features  in  two  different  images  do  not  necessarily  look  the  same  due  to  differences 
in  foreshortening.  Features  which  appear  in  one  image  may  be  occluded  or  outside  the  field 
of  view  in  the  other,  or  there  may  be  multiple  solutions  for  matching  if  there  are  repeating 
patterns  in  the  images.  The  high  computational  cost  of  computing  similarity  measures  is 
an  additional  drawback  to  obtaining  a  large  number  of  accurate  matches. 

Methods  which  have  been  proposed  for  determining  correspondences  can  be  grouped 
into  three  broad  categories:  brightness-based  methods,  gray-level  correlation,  and  edge- 
based  methods.  These  differ  primarily  in  the  types  of  features  used  and  in  their  strategy  for 
solving  the  problem.  Hybrid  methods,  which  combine  aspects  from  each  of  the  approaches, 
have  also  been  developed;  however,  these  are  best  understood  by  examining  the  major 
categories  individually.  In  this  chapter,  I  will  review  the  advantages  and  weaknesses  of 
the  different  approaches  and  discuss  their  practicality  for  hardware  implementation  with 
respect  to  the  goals  of  the  present  system. 

3.1  Brightness-Based  Methods 

The  idea  in  brightness-based  methods  is  to  avoid  explicitly  searching  for  the  best  match 
for  each  pixel  by  formulating  a  global  minimization  problem  whose  solution  gives  the  relative 
displacement  of  every  point.  These  methods  are  similar  to  those  developed  for  computing 
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optical  flow  by  minimizing  the  error  in  the  brightness  change  constraint  equation.  In  fact, 
there  are  only  minor  differences  in  the  formulation  of  the  two  problems.  In  addition  to 
avoiding  search,  the  two  primary  advantages  to  this  approach,  which  are  not  shared  by  the 
correlation  and  feature-based  methods  discussed  next,  are  that  information  from  the  entire 
image  is  used  in  determining  the  offsets  and  that  it  is  possible  to  obtain  a  dense  set  of 
correspondences,  even  in  areas  of  the  image  which  lack  distinctive  features. 

The  procedure,  as  described  in  [49]  and  [60],  consists  of  assuming  that  the  gray-levels  of 
corresponding  points  are  approximately  the  same  and  finding  disparity  functions  d}j{x^y) 
and  dv{x,y)  such  that,  ideally, 

El{x  +  y),  y  +  ^dvix,  y))  =  Er(x  +  ^dnix,  y),  y  +  ]^dvix,  y))  (3.1) 

where  Ei{x,y)  and  E}i{x,y)  are  the  brightness  functions  associated  with  the  left  and  right 
images  respectively. 

Due  to  variations  and  offset  between  the  two  sensors,  it  is  not  expected  that  equa¬ 
tion  (3.1)  can  be  solved  exactly.  Instead,  the  desired  solution  is  the  one  which  minimizes 

an  error  function  composed  of  different  penalty  terms.  Horn  [49,  Chapter  13]  suggested 

4ot  =  j  J  +  >‘^(Uxdy  (3.2) 

where  q  is  the  error  resulting  from  the  failure  of  (3.1)  to  hold  exactly, 

Cj  =  Er  —  Er  (3-3) 

and  Cj  represents  the  departure  from  smoothness  of  the  disparity  functions  as  measured  by 
the  squared  Laplacian 

el  =  {V^dRf  +  [V^dvf  (3.4) 

The  coefficient  defines  the  relative  weighting  of  the  two  error  terms. 

Gennert  [60]  proposed  a  similar,  though  more  elaborate,  energy  function  which  included 
a  multiplicative  model  for  the  transformation  of  brightnesses  in  the  two  images 

Er  — ^  mEL  (3.5) 


The  multiplier  m  takes  into  account  changes  in  reflectance  due  both  to  changes  in  albedo 
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and  in  the  orientation  of  the  surface  being  imaged. 

The  only  significant  difference  between  equations  (3.2)  and  (2.55),  the  error  function 
for  optical  flow,  is  that  the  constant  brightness  assumption  is  not  expressed  in  terms  of  the 
spatial  and  temporal  derivatives  of  brightness.  However,  the  derivatives  reappear  in  the 
Euler  equations  w'hich  for  equation  (3.2)  are 

=  1  (^  +  ^)  (£,  -  (3,6) 

AMWv)  =  1(^  +  ^)  (£,-£„)  (3.7) 

The  functions  in  these  equations  are  evaluated  at  the  points  (r  +  ^dff{x,y),y)  and  (x  - 
^dff(x,y),y)  in  the  left  and  right  images,  respectively. 

The  aliasing  problem  discussed  in  Section  2.3.3  thus  arises  in  a  different  form.  If  the 
image  is  not  sufficiently  bandlimited,  or  if  good  initial  values  for  dn  and  dy  are  not  available, 
it  is  unlikely  that  a  minimization  procedure  based  on  gradient  descent  will  converge.  If  the 
derivatives  are  evaluated  too  far  away  from  the  correct  point,  the  gradient  will  not  point 
in  the  direction  of  the  solution.  It  would  thus  be  very  difficult  to  implement  this  method 
in  circuit  form  using  analog  networks  as  was  done  for  optical  flow  by  Tanner  [54],  and  for 
finding  the  focus  of  expansion  (FOE)  by  McQuirk  [59].  Furthermore,  it  should  be  added 
that  the  full  power  of  this  method  is  not  needed  for  computing  3-D  motion  since  a  dense 
set  of  point  correspondences  is  not  required  to  solve  the  rigid-body  motion  equations. 

3.2  Gray-level  Correlation  Techniques 

Gray-level  correlation  merits  close  attention  since  it  is  widely  used  in  commercial  ap¬ 
plications.  The  theoretical  basis  for  the  use  of  correlation  in  determining  point  matches 
between  two  images  is  the  well  known  result  from  classical  detection  and  estimation  theory 
that,  under  certain  conditions,  an  optimum  decision  rule  for  detecting  the  presence  of  a 
known  signal  in  an  observed  noisy  waveform  can  be  obtained  from  the  cross- correlation  of 
the  signal  with  the  waveform  [61].  The  decision  is  based  on  whether  the  value  of  the  correla¬ 
tion  function  is  above  a  threshold  determined  from  either  a  Bayes  cost  function  or  a  desired 
false  alarm  rate  in  a  Ney man- Pearson  test.  The  position  of  the  maximum  correlation  value 
gives  the  most  likely  position  of  the  signal,  under  the  hypothesis  that  it  is  present. 
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In  a  typical  procedure,  one  image  is  divided  into  N,  possibly  overlapping,  M  x  M  blocks. 
Let  1  <  i  <  iV  denote  the  ith  block  and  let  {j,k)  be  the  coordinates  in  the  first  image  of 
the  center  of  the  block.  The  correlation  function  of  the  ith  image  block  centered  at 
in  the  second  image  is  given  by 

k')  =  ElU  +  Tk  +  m)Enij'  +  /,  k'  +  m)  (3.8) 

I  m 

where  the  indices  I  and  m.  range  in  integer  steps  from  — (M  —  l)/2  to  +(M  -  l)/2. 

The  underlying  assumption  which  makes  the  value  of  the  cross- correlation  function  a 
sufficient  statistic  for  testing  the  presence  of  a  signal,  is  that  the  observed  waveform  is 
a  stationary  white  Gaussian  noise  process  upon  which  the  signal  may,  or  may  not,  be 
superimposed.  If  the  background  noise  process  is  nonstationary,  but  is  additive,  white  and 
Gaussian,  an  optimal  test  can  still  be  formulated,  but  the  detection  threshold  corresponding 
to  a  given  false  alarm  rate  will  be  a  function  of  position  and  must  be  computed  for  each 
block.  If  the  noise  is  non-white,  a  “whitening”  filter  should  be  applied  before  computing 
the  correlation  function. 

In  real  images,  the  background  process  is  generally  non-stationary  and  non- white.  When 
different  cameras  are  used,  sensor  offset,  combined  with  differences  in  the  illumination  of 
the  same  object  viewed  from  two  different  positions,  will  ensure  that  the  brightness  values 
measured  from  the  same  feature  will  almost  never  be  identical  in  the  two  images,  even  in  the 
absence  of  other  noise.  In  addition,  with  either  one  or  two  cameras,  the  background  noise — 
which  usually  means  the  other  features  in  the  image  as  well  as  variations  in  the  number 
of  photons  collected — will  seldom  ha.ve  zero  mean  value.  Practical  methods  for  eliminating 
the  effects  of  sensor  and  illumination  differences  are  to  preprocess  the  image  data  with  a 
band-pass  filter  to  both  remove  dc  offsets  and  reduce  the  variance  of  high  frequency  noise, 
or  to  compute  the  normalized  correlation  coefficient,  defined  by 

.  ^')  -  TlU,  k)fiR{j',  k') 

where  and  (JL,(yR  denote  the  sample  means  and  standard  deviations  of  El  and  Er 

over  their  respective  blocks. 

It  can  be  easily  verified  that  the  normalized  correlation  coefficient  is  unchanged  by  any 
linear  transformation  of  the  brightness  functions,  Er  and  Er.  The  search  for  the  maximum 


(3.9) 
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correlation  value  of  the  ith  block  is  confined  to  a  pre-specified  window  of  area  A  in  the 
second  image.  The  total  computational  cost  for  determining  the  best  match  for  all  A  blocks 
is  therefore  0{NM'^A)  if  the  function  is  evaluated  at  every  offset  of  each  search  window. 
If  M  is  large  enough,  the  total  complexity  may  be  reduced  using  fast  Fourier  transform 
(FFT)  techniques  to  0(A'Alog2  A).  The  cost  of  preprocessing  the  images  with  a  band-pass 
filter  is  0(X^log2  L)  for  X  X  X  images  and  is  0(L‘^M‘^)  for  computing  the  local  means  and 
variances  needed  for  the  correlation  coefficient  in  (3.9). 

The  cost  of  brute  force  search  by  computing  the  correlation  function  at  every  offset  of  a 
large  search  window'  is  prohibitively  high,  even  with  modern  fast  processors.  Systems  which 
have  been  designed  to  perform  matching  based  on  gray-level  correlation  generally  implement 
some  intelligent  method  for  reducing  the  search  space.  One  simple  method,  suggested  by 
Barnea  and  Silverman  [62],  is  to  use  a  sub-optimal,  but  more  easily  computed,  similarity 
measure  such  as  the  sum  of  absolute  values  of  differences 

v0,k')  =  T,T.\Ei.  (j  +  l,k  +  7n)  -  Enif  P  /,  k'  +  m)\  (3.10) 

I  m 

with  the  best  match  being  given  by  the  location  of  the  minimum  value  of  Vi- 

A  detection  test  based  on  the  sum  of  absolute  values  of  differences  is  a  computationally 
efficient  approximation  to  one  based  on  the  correlation  coefficient,  and,  under  certain  con¬ 
ditions,  the  two  are  in  fact  equivalent.  If  and  En  are  quantized  to  integer  values,  the 
absolute  value  of  the  difference  between  two  pixel  values  is  always  less  than  or  equal  to  the 
square  of  the  difference.  That  is, 

Vkij'X)  =  Y.Y.\EL{jPl,kP7-n)-ER{j'  +  l,k'Pm)\ 

I  m 

<  5]^(X;L(i  +  /,A:  +  m)-i?«(/  +  /,/.''  +  m))2  (3.11) 

I  m 

Using  equation  (3.9)  and  the  definitions  of  the  sample  means  and  variances 

^  X!  Z] (3.12) 

l  m 

=  +  +  mi)  -  Ilf 

I  m 


and 


(3.13) 
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it  can  be  shown  that 


V0,k')<M^ 


(ctl  -  +  (PL  -  RR  ?  +  2crLCri?(l  -  pi) 


(3.14) 


Thus  the  test  which  accepts  a  match  when  pi  >  is  equivalent  to  the  test  which  accepts 
a  match  when  Tq  <  Ty  =  f  +  (pi  -  pr)"^  +  'laicrRil  -  Tc)].  Also,  as  long  as 

(PlH^r)  {(Ti,aR)  are  approximately  constant  over  the  search  window,  the  position  of 
the  minimum  value  of  Vi  will  be  the  same  as  that  of  the  maximum  value  of  pi. 

Many  commercial  hardware  systems  for  feature  detection  have  been  developed  based  on 
the  measures  just  described.  A  few  examples  among  the  many  currently  available  systems 
are:  1)  the  alignment  system  developed  by  Cognex  Corporation  which  uses  normalized 
correlation  search  [63].  This  system,  which  is  contained  on  a  single  340mm  x  366mm  printed- 
circuit  board,  along  with  image  capture  hardware,  frame  memory  and  other  interfaces,  uses 
intelligent  search  strategies  and  clever  programming  tricks  to  achieve  high-speed  alignment. 
In  a  recent  brochure,  Cognex  claims  to  be  able  to  align  a  128x128  pixel  template  in  a 
500x400  pixel  image  in  200  milliseconds.  2.)  The  real-time  image  processing  and  alignment 
board  by  Sharp  Digital  Information  Products,  Inc.,  which  fits  into  a  personal  computer. 
Using  software  which  runs  on  the  host  computer’s  CPU  and  which  interfaces  to  two  special- 
purpose  processor  boards,  they  claim  that  the  system  can  find  a  100x100  template  within 
a  512x512  search  area  in  less  than  100  milliseconds.  3.)  The  MaxVideo  20  system  by 
Datacube,  Inc.,  which  is  perhaps  the  most  widely  used  system,  interfaces  to  a  VME  bus  and 
performs  numerous  image  processing  applications,  along  with  alignment.  4.)  The  STI3220 
single-chip  motion  estimation  processor  designed  by  SGS  Thompson  Microelectronics  used 
to  implement  the  MPEG  data  compression  algorithm  at  video  rates.^  This  chip  finds 
the  minimum  sum-of-absolute-values-of-dilTerences  between  two  blocks  over  a  maximum 
displacement  of  -|-7/  —  8  pixels  in  both  horizontal  and  vertical  directions. 

Some  experimental  designs  have  also  been  based  on  gray-level  correlation.  Recently 
Hakkarainen  [65]  developed  a  test  system  in  analog  CCD/CMOS  technology  to  compute 
stereo  disparities  along  a  single  horizontal  row  of  pixels  (parallel  epipolar  geometry  as¬ 
sumed).  The  matching  circuit  incorporated  a  40x40  pixel  absolute- value-of-difference  array 
designed  to  find  candidate  matches  within  a  maximum  disparity  range  of  11  pixels. 

^MPEG  is  a  motion  picture  compression  technique  based  on  coding  the  offsets  between  blocks  in  two 
frames.  See  [64]  for  more  details. 
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Despite  the  preponderance  of  gray-level  correlation  in  commercial  vision  applications 
and  elsewhere,  it  does  have  limitations  which  make  it  unsuitable  for  computing  general  3-D 
motion.  An  important  drawback  is  that  it  is  very  sensitive  to  differences  in  foreshortening 
of  surfaces  which  are  viewed  from  different  angles.  Unless  the  motion  of  the  cameras  is 
defined  by  a  pure  translation  parallel  to  the  image  plane,  and  all  objects  in  the  scene  are 
at  the  same  depth,  the  two  images  will  not  simply  be  shifted  versions  of  each  other.  The 
conditions  for  optimality  of  the  correlation-based  decision  rule  no  longer  hold  in  the  presence 
of  foreshortening  since  the  signal  is  distorted.  The  systems  just  cited  were  developed  for 
applications  in  which  foreshortening  is  not  a  major  problem.  In  industrial  settings,  the 
scene  structure  can  be  controlled,  and  the  choice  of  the  template  to  be  matched  is  guided 
by  the  user.  Furthermore,  the  parts  to  be  located  or  aligned  are  usually  confined  to  a  single 
plane  which  is  held  at  a  fixed  orientation  to  the  optical  axis.  In  the  MPEG  compression 
algorithm,  small  offset  errors  are  not  very  important  because  the  human  visual  system 
cannot  perceive  fine  spatial  detail  in  moving  image  sequences.  In  more  general  settings, 
however,  particularly  when  long  baselines  are  used  as  is  often  the  case  in  binocular  stereo, 
foreshortening  cannot  be  neglected. 

A  second  limitation  of  gray-level  correlation  is  that  it  is  not  by  itself  sufficient  for 
computing  reliable  matches  given  arbitrary  blocks  within  an  image.  The  performance  of 
a  correlation  test  depends  on  the  signal-to-noise  ratio,  which  is  low  if  the  block  does  not 
contain  distinctive  features.  For  example,  it  is  very  difficult  to  find  the  correct  offset  for 
a  patch  of  constant  or  smoothly  varying  brightness  within  a  larger  region  also  of  constant 
or  smoothly  varying  brightness.  Industrial  applications  avoid  this  problem  by  using  large, 
previously  selected  templates.  However,  if  the  blocks  are  chosen  by  arbitrarily  dividing 
the  image,  there  is  no  guarantee  that  each  one  can  be  reliably  matched.  Since  further 
processing,  such  as  finding  edges,  is  required  to  determine  the  distinctiveness  of  each  block, 
methods  which  are  based  on  matching  the  edges  themselves  are  more  attractive  in  general 
than  those  based  on  gray  level  correlation. 

3.3  Edge-Based  Methods 

Edges,  which  are  locations  of  rapid  and  significant  change  in  the  image  brightness  func¬ 
tion,  are  usually  caused  by  changes  in  the  surface  reflectance  or  orientation  of  the  imaged 
objects.  These  occur  at  changes  in  surface  markings  as  well  as  at  the  boundaries  of  objects 
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and  are  thus  intrinsic  characteristics  of  the  scene  which  transform  under  the  same  rules  of 
rigid  body  motion  as  the  surfaces  to  which  they  are  associated. 

Among  the  advantages  of  using  edges,  as  opposed  to  gray  levels,  as  matching  primitives 
are  that  they  are  insensitive  to  photometric  effects  and  that  they  are  less  strongly  affected 
than  gray  levels  by  foreshortening.  A  simple  example  of  the  latter  issue  is  shown  in  Figure  3- 
1  which  depicts  a  hypothetical  perspective  transformation  of  a  wire-frame  box.  Although 
the  lengths  of  some  of  the  sides  change,  the  edge  patterns  at  the  corner  points  retain 
enough  similarity  that  they  can  be  uniquely  matched  between  the  two  views.  The  often- 
cited  disadvantage  of  edge-based  methods — that  they  can  only  generate  sparse  matches — is 
a  problem  for  obtaining  depth  from  binocular  stereopairs,  but  not  for  computing  motion. 

Edge-based  methods  can  be  divided  into  two  categories  according  to  the  manner  in  which 
they  represent  edges.  Methods  in  the  first  category  operate  directly  on  the  binary  image, 
or  edge  map,  produced  by  the  edge  detection  algorithm,  using  some  form  of  correlation 
matching  similar  to  those  described  for  gray  level  images.  Methods  in  the  second  category, 
however,  take  the  output  of  the  edge  detector  and  extract  higher-level  primitives,  such  as 
lines  and  corners.  The  attributes  of  these  primitives,  i.e.,  length,  end-point  coordinates, 
direction,  etc.,  are  then  compiled  into  a  symbolic  description  of  the  principal  features  of 
the  image,  originally  referred  to  by  Marr  as  the  full  primal  sketch  [66].  Matching  is  then 
performed  by  searching  the  feature  space  for  the  best-matching  sets  of  attributes. 

There  are  many  possible  variations  on  methods  for  using  the  binary  representations  of 
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edge  locations  for  correlation.  Novak  [67]  lists  several  different  similarity  measures  and 
discusses  their  relative  merits.  Wong  [68]  describes  different  processing  techniques  that 
can  be  applied  to  the  edges  to  yield  more  reliable  matches.  Nishihara  [69]  developed  a 
binary  correlation  method  based  on  the  sign  representation  of  the  image  convolved  with  a 
Laplacian  of  Gaussian  operator,  V^G',  where  G',  given  by 


G  = 


27ra 


-e  2ct2 


(3.15) 


is  the  Gaussian  smoothing  filter  of  bandwidth  proportional  to  l/c.  The  zero-crossings  of 
the  V^G-filtered  image,  which  occur  at  local  maxima  in  the  smoothed  brightness  gradient, 
were  first  proposed  by  Marr  and  Hildreth  [70]  as  markers  for  edges.  In  Nishihara’s  method, 
edges  are  implicitly  represented  by  encoding  the  sign  bit  of  V^G  *  I  rather  than  by  explic¬ 
itly  locating  its  zero-crossings.  He  showed  that  this  representation,  whose  auto- correlation 
function  is  sharply  peaked  at  the  origin,  permits  higher  resolution  disparity  measurements 
than  correlation  using  the  values  of  V^G  *  I  itself.  This  algorithm  has  been  implemented 
as  a  stand-alone  system  designed  on  a  VME  bus  with  a  video  rate  Laplacian-of-Gaussian 
convolver.  The  present  version  of  the  system  allows  disparity  measurements  at  an  arbitrary 
image  location  in  approximately  400  microseconds.  The  sign- correlator  algorithm  operates 
on  a  single  6U  (233.4mm  x  160mm)  VME  bus  board  and  implements  36  parallel  correlators 
that  run  at  a  lOMHz  pixel  rate.  The  La.placian-of-Gaussian  convolution  is  performed  by  a 
second  6U  VME  bus  board  that  takes  lOMHz  digital  video  input  from  a  Datacube  maxbus 
from  the  two  cameras  to  produce  two  16-bit  digital  video  raster  signals.  These  two  boards 
fit  in  a  VME  bus  box  along  with  a  video  digitizer  board,  a  single  board  computer,  and  a 
motor  controller  board  for  the  camera  head.^ 

From  the  viewpoint  of  detection  theory,  binary  correlation  methods  based  on  the  edge 
maps  offers  the  same  advantages  as  gray-level  correlation  without  several  of  the  disadvan¬ 
tages.  As  mentioned,  edges  are  less  sensitive  than  surfaces  to  foreshortening.  Iir  addition, 
it  is  much  easier  to  test  the  reliability  of  matches  from  edge-based  correlation.  As  will  be 
shown  in  Chapter  6,  reliability  is  directly  related  to  edge  density,  which  can  be  determined 
by  simply  counting  the  number  of  edge  pixels  in  the  blocks  being  compared. 

The  accuracy  of  block- correlation  techniques,  gray-level  or  binary,  is  inherently  limited. 


K.  Nishihara,  personal  communication,  September  1992 
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however,  by  the  implicit  assumption  tha.t  the  entire  block  is  at  the  same  disparity.  The 
advantage  of  symbolic  matching  techniques  is  that  they  can  provide  true  point-to-point 
matching,  since  corners  are  matched  to  corners  and  hue  segments  to  line  segments.  There  are 
many  algorithms  which  have  been  developed  to  perform  search  on  the  set  of  extracted  image 
features.  They  differ  primarily  in  their  choice  of  attributes  and  in  the  constraints  which  are 
imposed  to  reduce  both  the  search  space  and  the  number  of  false  matches.  Crimson  [71] 
developed  a  hierarchical  method  based  on  a  coarse- to-fine  matching  of  zero-crossing  contours 
of  the  same  sign  from  the  image  convolved  with  Laplacian-of-Gaussian  operators,  V^G,  with 
different  values  of  ct.  He  imposed  both  consistency  and  figural  continuity  constraints  to  limit 
false  matches  and  to  effectively  map  complete  contours.  Ayache  and  Faverjon  [72]  proposed 
creating  neighborhood  graph  descriptions  from  the  extracted  line  segments  in  each  image 
and  then  determining  the  largest  connected  components  of  a  disparity  graph  built  from 
the  two  descriptions.  Matches  are  validated  by  imposing  global  continuity  constraints  and 
rejecting  any  connected  components  with  too  few  members.  Fleck  [73]  recently  proposed 
another  variation  on  these  methods  by  introducing  a  topological  pre-matching  filter  which 
provides  a  stronger  test  than  allowed  by  consistency  and  figural  continuity  constraints. 

The  primary  disadvantage  of  symbolic  matching  techniques  with  respect  to  the  design 
goals  of  the  3-D  motion  system  is  that  they  cannot  both  be  easily  implemented  in  simple 
hardware  and  be  expected  to  operate  at  video  rates.  Reducing  the  edges  to  primary  features 
and  building  the  symbolic  descriptions  requires  processing  and  memory  resources  that  are 
beyond  the  capabilities  of  single-chip  systems.  Binary  correlation,  on  the  other  hand,  can 
be  implemented  relatively  cheaply.  The  primary  expense  is  not  in  computing  the  correlation 
measure,  which  requires  much  less  hardware  than  if  gra.y  levels  are  used,  but  in  initially 
computing  the  edge  maps.  In  the  sign- correlation  system  developed  by  Nishihara,  for  ex¬ 
ample,  only  one  6FT  VME  board  is  devoted  to  the  actual  correlation  operation,  while  two 
boards  and  a  Datacube  image  processor  are  required  for  capturing,  digitizing,  and  filtering 
the  images. 

As  the  overview  in  this  chapter  has  shown,  there  is  no  simple  method  that  can  provide 
accurate  and  reliable  point  correspondences  in  all  situations.  The  procedure  which  is  best 
suited  to  the  present  system  is  the  one  that  provides  the  best  tradeoff  between  the  require¬ 
ments  for  accuracy  and  simplicity.  Among  the  different  approaches  for  determining  point 
correspondences,  the  block  matching  procedures  based  on  computing  similarity  measures 
between  edges  are  the  simplest  to  implement  in  hardware.  Since  edges  can  be  represented 
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by  binary  values,  computations  can  be  performed  as  boolean  operations.  Furthermore,  the 
number  of  edges  in  each  block  provides  an  easily  computed  measure  of  the  reliability  of 
the  match.  The  primary  drawback  of  block  matching  procedures  is  their  limited  accuracy 
due  to  the  assumption  that  the  entire  block  is  at  the  same  depth.  We  will  have  to  ensure 
that  enough  matches  are  found  so  that  on  average  the  error  will  go  to  zero  in  order  for  the 
motion  algorithm  to  compute  accurate  estimates  of  the  camera  motion. 


Chapter  4 


System  Architecture 


We  can  now  formulate  a  plan  for  the  architecture  of  a  system  to  compute  3-D  motion 
with  specialized  analog  and  digital  VLSI  processors  based  on  the  diagram  of  Figure  4-1. 
The  two  input  images  acquired  at  the  different  camera  positions  will  be  referred  to  as  left 
and  right,  regardless  of  whether  this  terminology  reflects  their  true  spatial  disposition.  If 
the  images  are  acquired  by  the  same  camera,  the  two  input  blocks  should  be  considered  as 
memory  buffers. 


Figure  4-1:  System  block  diagram 


Two  major  trends  can  be  discerned  in  observing  the  processing  and  data  flow  shown  in 
this  diagram.  The  first  is  that  the  amount  of  data  decreases  significantly  from  left  to  right, 
from  the  thousands  of  pixels  in  the  input  images,  to  the  reduced  binary  edge  maps,  and 
then  to  the  set  of  point  correspondences  that  are  used  to  compute  the  few  numbers  which 
characterize  the  motion.  The  second  trend,  however,  is  that  computational  complexity 
increases  just  as  significantly  in  the  same  direction.  The  mix  of  analog  and  digital  processing 
which  is  the  most  power-  and  area-efficient  for  a  given  task  is  largely  determined  by  the 
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ratio  of  data  to  the  complexity  of  the  operations  performed  on  it. 

Working  backwards  from  right  to  left,  we  see  that  the  nltimate  goal  of  the  processing 
performed  on  the  two  images  is  to  extract  a  set  of  point  correpondences  which  can  be  used 
in  the  final  stage  to  determine  the  motion.  Given  the  complexity  of  solving  the  motion 
equations,  it  is  clear  that  standard  digital  processing  techniques  are  required.  Any  one  of 
the  many  currently  available  powerful  microprocessors,  such  as  the  TI  TMS320C40,  or  the 
Motorola  68040,  can  certainly  do  the  job.  However,  since  the  ultimate  goal  is  to  build  a 
low-power  system,  we  need  to  use  the  simplest  processor  that  is  adequate  for  the  task.  In 
this  thesis,  I  did  not  attempt  to  design  a  minimal  custom  digital  circuit  to  solve  the  motion 
equations.  However,  I  will  show,  in  Chapter  7,  how  the  complexity  of  the  motion  algorithm 
can  be  reduced  so  that  it  can  be  implemeirted  on  a  low-end  processor  or  microcontroller. 

The  set  of  point  correspondences  which  are  fed  into  the  motion  algorithm  are  best  found 
by  matching  edges  using  binary  correlation,  as  was  concluded  at  the  end  of  Chapter  3.  In  the 
second  processing  stage  we  can  build  a  pipelined  array  of  matching  circuits,  each  of  which 
computes  for  a  specific  patch  in  one  edge  map  the  translational  offset  which  brings  it  into 
alignment  with  the  most  similar  patch  in  the  second  edge  map.  The  search  is  restricted  to  a 
predefined  area  whose  dimensions  should  be  user-controllable  according  to  the  application. 
Since  the  search  window  may  need  to  be  quite  large  if  the  baseline  between  the  two  camera 
positions  is  long  or  if  any  amount  of  rotation  is  involved,  it  is  necessary  to  use  a  scoring 
method  which  has  a  very  low'  false-alarm  rate.  Given  that  repeating  patterns  frequently 
occur  in  real  scenes,  a  similarity  measure  alone  is  not  sufficient  to  achieve  an  acceptable  error 
rate.  In  Chapter  6,  I  will  present  the  scoring  method  to  be  implemented  by  the  matching 
circuits  as  well  as  discuss  the  tests  which  are  included  in  the  decision  rule  to  minimize  the 
number  of  false  matches. 

Because  the  edge  signals  are  binary,  computing  the  scores  for  each  offset  requires  rel¬ 
atively  little  circuitry  and  can  be  easily  done  in  digital  logic.  Tallying  the  scores  and 
determining  the  best  match,  however,  are  considerably  more  complex.  In  Part  III  of  this 
thesis,  I  will  describe  the  design  of  a  mixed  analog  and  digital  circuit  for  finding  the  best 
match  and  compare  it  to  a  purely  digital  implementation. 

Edge  detection  is  performed  in  the  first  stage  of  the  motion  system  by  operating  di¬ 
rectly  on  the  signals  acquired  by  the  photosensors.  Here,  there  is  a  tremendous  amount  of 
data  to  be  processed,  but  the  operations  involved,  which  are  computing  local  averages  and 
differencing  neighboring  pixel  values,  are  relatively  simple.  We  thus  have  a  situation  where 
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analog  processing  can  offer  significant  advantages  over  standard  digital  methods.  Detecting 
edges  efficiently  in  analog  VLSI,  however,  requires  an  algorithm  which  is  adapted  to  the 
technology,  unlike  most  standard  methods  which  were  designed  for  digital  computers.  The 
multi-scale  veto  (MSV)  algorithm,  presented  in  the  next  chapter,  was  thus  developed  to 
take  advantage  of  operations  which  are  easily  performed  in  analog,  while  avoiding  those 
that  are  not.  Part  II  of  this  thesis  is  devoted  to  describing  the  design,  fabrication  and 
testing  of  a  CCD-CMOS  edge  detector  implementing  the  multi-scale  veto  algorithm,  which 
is  a  prototype  of  one  that  could  be  used  in  the  3-D  motion  system  presented  above. 


Chapter  5 


An  Algorithm  for  an  Analog  Edge  Detector 


The  bulk  of  hardware  resources  in  almost  all  image  processing  systems  is  dedicated  to 
data  storage  and  to  the  initial  operations  performed  on  the  brightness  values  acquired  by  the 
sensors.  One  image  typically  contains  several  tens  to  hundreds  of  thousands  of  pixels,  each 
usually  digitized  to  8  bits.  Even  simple  computations,  such  as  adding  and  subtracting  pixel 
values  require  substantial  processing  due  to  the  large  amount  of  data  involved.  Given  that 
relatively  low  precision  is  required,  however,  there  is  a  clear  opportunity  for  analog  circuits 
to  perform  many  of  the  initial  processing  tasks  on  the  image  data.  Analog  circuits  which 
are  specifically  designed  for  a  task  can  perform  arithmetic  and  logic  operations  with  6-8 
bits  precision  in  much  less  area  than  an  equivalent  digital  implementation.  Furthermore, 
by  remaining  in  the  analog  domain,  there  is  no  need  to  digitize  the  signal  from  the  sensor 
before  it  can  be  processed. 

The  multi-scale  veto  (MSV)  algorithm  described  in  this  chapter  was  developed  to  solve 
two  problems.  The  first  was  that  we  needed  an  edge  detection  algorithm  which  could 
be  efficiently  implemented  on  a  fully  parallel  analog  processor.  The  second  was  that  we 
also  needed  an  algorithm  to  accurately  localize  edges  without  being  overly  sensitive  to 
noise.  In  this  chapter  I  will  discuss  how  both  problems  were  addressed,  presenting  first 
some  background  on  classical  edge  detection  methods  to  explain  why  it  was  decided  to 
develop  a  new  method  rather  than  to  encode  an  existing  one  into  a  circuit  design.  I  will 
also  introduce,  at  a  conceptual  level,  circuit  models  for  implementing  the  MSV  algorithm. 
The  more  detailed  design  description  will  be  saved,  however,  for  part  II  where  the  actual 
prototype  processor  which  was  built  based  on  these  models  is  presented.  In  the  final  section, 
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I  will  present  results  from  simulating  the  algorithm  on  a  pair  of  image  sequences  which  will 
be  seen  again  in  Chapters  6  and  7  to  demonstrate  the  matching  procedure  and  the  motion 
algorithm. 

It  should  be  noted  that  most  of  this  chapter  is  derived  from  a  previously  published 
paper  [74]  in  which  the  multi-scale  veto  algorithm  was  first  described. 

5.1  The  Multi-Scale  Veto  Rule 

The  problem  of  edge  detection  is  to  find  and  mark  locations  of  significant  change  in  the 
image  brightness  function  that  are  due  to  changes  in  the  reflectance  of  objects  in  the  scene, 
while  ignoring  any  changes  caused  by  high  spatial  frequency  features  attributable  to  noise. 
‘Noise’  is  not  a  well-defined  term,  however,  as  it  is  used  to  refer  both  to  random  fluctuations 
in  the  number  of  photons  collected  as  well  as  to  small-scale  ‘unimportant’  features  in  the 
image.  Noise  can  be  removed  by  applying  a  linear  low-pass  smoothing  filter.  However, 
this  has  the  effect  of  attenuating  all  high  frequency  components  indiscriminately  and  in¬ 
troducing  uncertainty  in  the  edge  locations.  Nonlinear  methods,  such  as  median  filtering, 
which  preserve  important  edges  and  remove  noise  are  also  possible.  However,  these  require 
more  computation  than  linear  filtering  and  generally  cannot  be  implemented  by  convolu¬ 
tion.  The  MSV  algorithm  was  designed  to  overcome  the  problems  associated  with  standard 
linear  filtering  methods.  Its  circuit  implementation  is  conceptually  straightforward,  and  it 
incorporates  a  simple  procedure  for  the  user  to  select  the  types  of  features  which  are  to  be 
defined  as  noise. 

In  the  MSV  algorithm,  edges  are  defined  as  sharp  changes  in  brightness  which  are 
significant  over  a  range  of  spatial  scales.  In  order  to  test  for  the  presence  of  an  edge, 
a  sequence  of  low-pass  filters  of  decreasing  bandwidth  is  applied  to  the  image,  and  the 
differences  between  the  smoothed  brightness  values  of  neighboring  pixels  are  computed.  An 
edge  exists  between  two  pixels  if  the  difference  in  their  values  is  above  a  threshold,  which 
is  specified  for  each  filter,  at  all  levels  of  smoothing.  If  the  threshold  test  is  failed  for  any 
filter,  the  edge  is  vetoed. 

The  rationale  behind  the  multi-scale  veto  method  can  be  explained  by  observing  how  it 
treats  different  types  of  features.  Let  Xfc[m,  n]  denote  an  array  of  sampled  brightnesses  which 
has  been  convolved  with  the /sth  low-pass  filter.  Leiyk[m^n\  =  Xk[m,n]  —  Xk[m-,n  —  l]  denote 
the  differences  in  the  smoothed  brightnesses  in  the  direction  of  the  second  coordinate,  and 
n]  the  attenuation  of  the  difference  signal,  such  that  yk[m^  n]  =  Gfc[m,  n]yQ[m,,  n].  The 
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filters  are  ordered  in  decreasing  bandwidth  so  that  Gk[in,n]  >  Gk+i[m,n].  Let  denote 
the  threshold  for  the  ktli  filter,  and  suppose  that  there  is  an  abrupt  change  in  brightness 
between  n  =  0  and  n  =  —  1  such  that  2/o[m,  0]  =  A,  with  A  a  positive  constant. 

Formally  stated,  an  edge  will  be  marked  at  n  =  0  only  if 


A  > 


Tk 

Gk[m,0] 


(5.1) 


for  =  0, . . . ,  i\^  -  1,  where  N  is  the  number  of  filters  applied. 

An  example  is  illustrated  in  Figure  5-1  for  the  cases  of  an  ideal  step  edge  and  an  isolated 
noise  spike.  The  step  edge  is  marked  at  n  =  0  because  the  differences  yo[m,0]  and  yk[m,0] 
are  both  above  threshold.  No  other  locations  are  marked,  although  yk[m,  n]  7^  0  in  general, 
and  for  some  n  may  even  be  greater  than  r^,  because  yo^ni,  n]  =  0  for  all  n  7^  0.  Hence  the 
unsmoothed  differences  will  veto  the  marking  of  an  edge  everywhere  except  at  =  0.  In 
the  case  of  the  isolated  noise  spike,  the  difference  at  n  =  0,  which  is  of  the  same  magnitude 
as  for  the  step,  passes  the  threshold  test  for  the  unsmoothed  data.  However,  it  fails  for 
the  smoothed  data  since  the  isolated  spike  is  attenuated  more  strongly  than  the  step  edge, 
and  hence  no  edge  is  marked.  In  general,  it  may  be  observed  that  while  the  bandwidth 
and  threshold  of  the  narrowest-band,  or  largest  scale,  filter  determines  the  effectiveness 
with  which  noise  and  small  features  are  removed,  the  widest-band,  or  smallest  scale,  filter 
determines  the  accuracy  with  which  edges  are  localized. 

The  idea,  of  using  multiple  scales  in  edge  detection  is  not  new.  It  is  the  following  features 
which  distinguish  the  MSV  algorithm  from  conventional  methods. 

•  Edges  are  not  defined  as  local  maxima  in  the  magnitude  of  the  gradient,  or  equiva¬ 
lently,  as  zero-crossings  of  the  second  derivative.  Hence  computation  of  second  differ¬ 
ences  is  unnecessary. 

•  All  of  the  difference  operations  and  threshold  tests  at  different  scales  can  be  performed 
on  the  same  physical  network. 

Both  features  represent  a  considerable  savings  in  circuitry,  which  is  crucial  if  the  network 
is  to  be  designed  for  large  image  arra.ys. 

By  definition,  edges  exist  behveen  two  pixels  on  a  discrete  two-dimensional  array.  How¬ 
ever,  to  avoid  redefining  the  image  grid,  their  locations  are  indicated  in  the  output  of  the 
edge  detection  network  by  setting  a  binary  flag  at  the  locations  of  the  pixels  between  which 
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(a)  Response  to  an  ideal  step.  Differences  j/o  and  yk  are  both  above  threshold  at  n  =  0.  The  edge 
is  marked  at  the  point  of  change  in  the  unsmoothed  data. 


(b)  Response  to  ideal  point  noise.  The  difference  yo  in  the  unsmoothed  image  is  above  threshold  tq 
at  n  =  0;  but  the  difference  yk  in  the  smoothed  image  is  below  the  threshold  rj-.  Hence  no  edge  is 
marked. 


Figure  5-1;  Results  of  applying  the  multi-scale  veto  rule  to  an  ideal  step  edge  and  to  an 
impulse. 
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they  occur.  To  simplify  the  discussion  these  marked  pixels  will  be  referred  to  as  edge  pixels; 
although  a  more  exact  term  would  be  edge-adjacent  pixels. 

It  should  be  noted  that  a  significant  consequence  of  defining  edges  by  the  multi-scale 
veto  rule  is  that  a  very  good  visual  approximation  to  the  original  image  can  be  reconstructed 
from  the  (possibly  smoothed)  brightnesses  at  the  edge  pixels.  This  operation  can  be  per¬ 
formed  by  a  second  processor  which  recomputes  brightness  values  for  non-edge  locations  by 
interpolation  from  the  values  at  the  edge  pixels.  It  is  thus  possible  to  recover  a  smoothed 
version  of  the  original  image  from  noise-corrupted  input  while  maintaining  important  high 
frequency  information.  A  more  complete  analysis  of  this  aspect  of  the  MSV  algorithm, 
along  with  examples  of  reconstructed  noisy  images  can  be  found  in  the  original  paper  [74]. 

5.2  Other  Methods  and  the  Use  of  Multiple  Scales 

In  most  work  in  computer  vision,  edges  are  defined  as  the  loci  of  maxima  in  the  first 
derivative  of  brightness,  and  as  such  can  be  detected  from  zero-crossings  in  the  second 
derivative.  This  is  the  basis  on  which  many  edge  and  line  detectors,  such  as  the  Marr- 
Hildreth  Laplacian-of- Gaussian  (LOG)  filter  [70],  the  Canny  edge  detector  [75],  and  the 
Binford-Horn  line  finder  [76],  have  been  designed.  The  problem  of  finding  edges  by  the 
numerical  differentiation  of  images,  however,  is  ill-posed  [77].  Small  amounts  of  noise,  which 
are  amplified  by  differentiation,  can  displace  the  zero-crossings  or  introduce  spurious  ones. 
A  low-pass  smoothing,  or  regularization,  filter  must  be  applied  to  stabilize  the  solution. 

The  issue  of  scale  arises  because  features  in  the  image  generally  occur  over  a  range  of 
spatial  scales.  By  varying  the  passband  of  the  smoothing  filter,  one  can  select  the  size  of 
the  features  which  give  rise  to  edges.  Unfortunately,  the  information  which  permits  the 
edge  to  be  accurately  localized  to  the  feature  which  produced  it  is  thrown  out  with  the  high 
frequency  components.  Marr  and  Hildreth  first  proposed  finding  edges  from  the  coincident 
zero- crossings  of  different  sized  LOG  filters.  Witken  [78]  introduced  the  notion  of  scale- 
space  filtering,  in  which  the  zero-crossings  of  the  LOG  are  tracked  as  they  move  with  scale 
changes.  These  methods  are  a  form  of  multi-scale  veto,  but  the  complexity  of  tracking  the 
zero- crossings  makes  them  ill-suited  for  implementation  in  specialized  VLSI. 

An  alternative  solution  to  removing  noise  while  retaining  the  high  frequency  information 
associated  with  large  scale  features  is  to  apply  nonlinear  filtering.  The  median  filter  [79], 
for  example,  has  long  been  used  in  image  processing  because  it  is  particularly  effective  in 
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removing  impulse,  or  ‘salt-and-pepper’,  noise.  An  approach  put  forward  in  recent  years  is 
the  idea  of  edge  detection,  or  more  precisely  image  segmentation,  as  a  problem  in  mini¬ 
mizing  energy  functionals.  The  first  proposal  of  this  nature  was  the  Markov  Random  Field 
(MRF)  model  of  Geman  and  Geman  [35].  In  an  MRF  the  minimum  energy  state  is  the 
maximum  a  posteriori  (MAP)  estimate  of  the  energies  at  each  node  of  a  discrete  lattice. 
The  MAP  estimate  corresponds  to  a  given  configuration  of  neighborhoods  of  interaction. 
‘Line  processes’  are  introduced  on  the  lattice  to  inhibit  interaction  between  nodes  which 
have  significantly  different  prior  energies,  thereby  maintaining  these  differences  in  the  final 
solution. 

Mumford  and  Shah  [80]  studied  the  energy  minimization  problem  reformulated  in  terms 
of  deterministic  functionals  to  be  minimized  by  a  variational  approach.  Specifically,  they 
proposed  finding  optimal  approximations  of  a  general  function  d{x,y),  representing  the 
data,  by  differentiable  functions  u{x,y)  that  minimize 

J(u,r)  =  p^y  j  {u  -  dfdxdy  A  j  j  \Vu\'^dxdy  A  v\L\  (5.2) 

where  u{x,y)  is  a  piecewise  smooth  approximation  to  the  original  image  and  P  is  a  closed 
set  of  singular  points,  in  effect  the  edges,  at  which  u  is  allowed  to  be  discontinuous.  The 
coefficients  and  v  are  the  weights  on  the  different  penalty  terms. 

Blake  and  Zisserman  [81]  referred  to  (5.2)  as  the  ‘weak  membrane’  model,  since  J(u,  P) 
resembles  the  potential  energy  function  of  an  elastic  membrane  which  is  allowed  to  break 
in  some  places  in  order  to  achieve  a  lower  energy  state.  If  P  is  known,  the  solution  to  the 
minimization  problem  can  be  found  directly  from  the  calculus  of  variatons.  However,  the 
problem  is  to  find  both  P  and  u{x,y)  by  trading  off  the  closeness  of  the  approximation 
and  the  number  of  discontinuities  in  the  set.  As  a  result  the  energy  function  J(u,P)  is 
nonconvex  and  possesses  many  local  stationary  states  that  do  not  correspond  to  the  global 
minimum.  Blake  and  Zisserman  were  able  to  circumvent  the  problem  of  multiple  local 
minima  by  developing  a  continuation  method  to  solve  the  minimization  problem  iteratively. 

The  weak  membrane  model  was  one  of  the  first  methods  to  be  implemented  in  ana¬ 
log  VLSI.  Digital  circuits  for  performing  Gaussian  convolution  and  edge  detection  began 
appearing  in  the  early  1980’s  [82],  [83].  The  possibility  of  performing  segmentation  and 
smoothing  with  analog  circuitry,  however,  did  not  seem  practical  until  the  problem  had 
been  posed  in  terms  of  a  physical  model.  Harris  [84]  developed  the  first  CMOS  resistive 
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fuse  circuit,  which  is  a  two-terminal  nonlinear  element  that  for  small  voltages  behaves  as 
a  linear  resistor,  but  ‘breaks’  if  the  voltage  across  its  terminals  becomes  too  large.  Several 
implementations  of  resistive  fuse  networks  have  since  been  built  to  compute  the  minimiza¬ 
tion  of  the  discrete  form  of  equation  (5.2)  [57],  [85].  Keast  [86]  developed  a  discrete-time 
version  of  the  weak  membrane  model  using  CCDs  to  perform  smoothing. 

Circuit  implementations  of  the  weak  membrane  model  cannot  escape  the  non-convexity 
problem,  however,  and  some  effort  is  required  to  push  them  to  the  globally  optimum  solu¬ 
tion.  The  MSV  model  is  similar  to  the  weak  membrane  in  that  it  also  assumes  an  image 
can  be  approximated  by  a  collection  of  piecewise  smooth  functions.  It  is  different,  however, 
in  that  it  does  not  formulate  edge  detection  as  an  energy  minimization  problem.  The  oper¬ 
ations  of  edge  detection  and  image  reconstruction  are  completely  separate  and  independent 
functions,  so  that  there  is  no  feedback  coupling  to  generate  alternate  local  minima.  Hence 
for  any  image  and  given  set  of  parameters,  there  is  a  unique  set  of  edges  which  will  be 
found. 

It  should  be  noted  that  the  edges  produced  by  the  MSV  network  are  not  as  ‘refined’  as 
those  produced  by  more  complex  methods  such  as  Canny’s  edge  detector  [75].  This  is  in 
part  due  to  the  way  edges  are  defined.  Since  many  feature  boundaries  are  more  like  ramps 
than  step  edges,  the  MSV  edges  are  often  several  pixels  thick.  It  is  also  due  to  the  need  to 
make  the  circuitry  as  simple  as  possible  in  order  to  minimize  silicon  area.  It  is  not  easy  to 
implement  contour  filling  or  thinning  algorithms  with  simple  circuits.  The  edges  produced 
by  the  MSV  algorithm  are  nonetheless  functionally  useful  for  many  early  vision  tasks,  and 
in  particular,  they  will  be  shown  to  be  useful  for  feature  matching. 

5.3  Circuit  Models 

It  is  not  necessary  to  build  a  multi-layered  processor  in  order  to  implement  the  multi¬ 
scale  veto  rule.  By  including  time  as  a  dimension,  a  single  smoothing  network  with  a 
controllable  space  constant  can  be  used.  It  is  well  known  that  resistive  grids,  such  as  the 
one  shown  in  Figure  5-2,  can  compute  an  analog  smoothing  function.  The  network  shown  is 
one-dimensional;  however,  it  can  be  easily  extended  to  two  dimensions  by  connecting  it  via 
transverse  resistors  to  parallel  1-D  networks.  By  equating  the  current  through  the  vertical 
resistors  connected  to  the  node  voltage  sources  di  to  the  sum  of  the  currents  leaving  the 
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Figure  5-2:  1-D  resistive  smoothing  network  with  controllable  space  constant. 


node  through  the  horizontal  resistors,  one  arrives  at  the  resistive  grid  equation: 

~  ^  ^  ^ 

where  the  subscript  k  is  an  index  over  the  nearest  neighbors  of  node  i.  The  continuous  2-D 
approximation  to  this  circuit  is  the  diffusion  equation 

u  -  =  d  (5.4) 

with 


which  is  the  space  constant,  or  characteristic  length,  over  which  a  point  source  input  will 
be  smoothed.  By  varying  the  values  of  Ry,  it  is  therefore  possible  to  control  the  bandwidth 
of  the  effective  low-pass  filter  applied  to  the  data. 

A  practical  way  to  build  a  controllable  smoothing  network  is  to  simulate  the  resistors 
with  charge-coupled  devices  (CCDs).  CCDs  are  best  known  for  their  role  as  image  sensors, 
but  they  are  also  capable  of  performing  more  advanced  signal  processing.  CCDs  operate  in 
the  manner  of  a  bucket  brigade,  where  the  ‘buckets’  are  potential  wells  under  polysilicon 
gates,  and  the  depths  of  the  wells  are  determined  by  the  voltages  applied  to  the  gates.  The 
‘water’  in  the  buckets  is  the  signal  charge  which  can  be  transferred,  mixed,  and  separated 
between  the  potential  wells  by  varying  the  sequences  of  the  clock  phases  which  drive  the 
gates.  CCDs  are  built  by  juxtaposing  gates  of  alternating  layers  of  polysilicon.  When  used 
as  image  sensors,  the  gates  are  held  at  a  high  potential  to  collect  the  charge  that  naturally 
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Figure  5-4:  Conceptual  model  of  the  ‘Edge  precharge  circuit’. 


occurs  when  photons  of  energy  greater  than  the  bandgap  of  silicon  hit  the  device  and  create 
electron-hole  pairs.  After  a  suitable  integration  time,  generally  a  millisecond  or  so,  the 
CCDs  then  function  as  analog  shift  registers  to  move  the  signal  charges  out  of  the  camera. 

The  basic  layout  of  the  2-D  network  required  to  use  CCDs  for  the  MSV  edge  detection 
algorithm  is  shown  in  Figure  5-3.  It  consists  of  a  grid  of  orthogonal  horizontal  and  verti¬ 
cal  transfer  channels  with  circuitry  placed  between  the  nodes  to  compute  differences  and 
perform  the  threshold  tests.  The  numbers  on  the  gates  signify  the  different  clock  phases 
which  are  used  to  move  signal  charges  in  the  array.  The  structure  of  this  network  is  the 
same  as  that  developed  by  Keast  to  implement  a  CCD  ‘resistive  fuse’  network  [86],  [57].  By 
appropriately  sequencing  the  clock  phases,  this  array  can  perform  smoothing  by  averaging 
the  signal  charge  held  under  each  node  with  each  of  its  neighbors.  Specihcafly,  it  applies 
the  convolution  kernel 


16 


1  2  1 
2  4  2 
1  2  1 


(5.6) 


to  the  image  signal  with  each  smoothing  cycle.  After  two  cycles  the  image  has  been  effec- 
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tively  convolved  with 

1  4  6  4  1  ’ 

4  16  24  16  4 

6  24  36  24  6  (5.7) 

4  16  24  16  4 

1  4  6  4  1  _ 

and  so  on.  The  bandwidth  of  the  smoothing  filter  is  thus  controlled  by  the  number  of  cycles 
performed. 

The  primary  difference  between  Keast’s  design  and  the  MSV  network  is  in  the  functions 
performed  by  the  circuits  placed  between  the  nodes.  The  multi-scale  veto  rule  is  imple¬ 
mented  by  the  edge  precharge  circuits,  shown  in  Figure  5-4  and  indicated  by  the  boxes 
labeled  EPC  in  Figure  5-3.  In  each  of  these,  a  capacitor  is  initially  charged  with  an  ‘edge’ 
signal.  At  each  smoothing  cycle,  the  absolute  value  of  the  difference  between  the  node  volt¬ 
ages  is  compared  to  a  threshold;  and  if  the  threshold  is  greater,  the  capacitor  is  discharged. 

The  complete  execution  of  the  multi-scale  veto  algorithm  consists  of  the  following  steps: 
The  array  is  initialized  by  transferring  signal  charge  proportional  to  image  brightness  under 
each  node  gate  (pixel)  and  by  charging  the  edge  capacitors.  The  signal  charge  is  formed 
either  by  direct  acquisition  using  the  CCD  array,  or  by  loading  the  pixel  values  from  an 
off-chip  sensor.  Several  smoothing  cycles,  '^5-10,  with  the  accompanying  threshold  tests, 
are  then  performed. 

When  these  are  completed,  the  edge  charges  from  the  four  precharge  circuits  connected 
to  each  node  are  tested;  and,  if  any  of  them  is  non-zero,  i.e.,  if  an  edge  was  detected  between 
the  node  and  one  of  its  four  neighbors,  a  binary  value  is  set  at  the  node  to  indicate  that  it 
is  an  edge  pixel. 

5.4  Choosing  the  Parameters 

It  might  seem  that  the  number  of  free  parameters — the  different  thresholds  for  each 
smoothing  filter,  as  well  as  the  number  of  smoothing  cycles — that  need  to  be  specified 
in  order  to  apply  the  multi-scale  veto  rule  would  make  the  method  impractical  or  even 
arbitrary.  However,  there  are  simple  ways  to  choose  the  parameters  based  on  the  types  of 
features  which  one  wishes  to  retain. 

The  edges  which  are  marked  by  the  edge  detection  network  are  those  which  pass  the 
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Attenuation  Factors 

Smoothing  Cycle: 

1 

2 

3 

4 

5 

Horizontal  step  edge 

0.500 

0.375 

0.313 

0.273 

0.246 

Diagonal  step  edge 

0.375 

0.273 

0.226 

0.196 

0.176 

Horizontal  1-pixel  line 

0.250 

0.125 

0.078 

0.055 

0.041 

Diagonal  1-pixel  line 

0.125 

0.055 

0.0.32 

0.022 

0.016 

Horizontal  2-pixel  line 

0.500 

0.313 

0.219 

0.164 

0.129 

Diagonal  2-pixel  line 

0.313 

0.164 

0.105 

0.074 

0.056 

1-pixel  impulse  noise 

0.125 

0.047 

0.024 

0.015 

0.010 

4-pixel  square  impulse 

0.375 

0.195 

0.120 

0.081 

0.058 

Horizontal  3-pixel  ramp 

0.750 

0.688 

0.641 

0.602 

0.568 

Table  5.1:  Attenuation  factors  for  different  types  of  features  as  a  function  of  smoothing 

threshold  test  at  all  smoothing  cycles.  Following  the  same  notation  used  in  the  example 
given  earlier,  let  k  denote  the  number  of  smoothing  cycles  performed,  and  let  ti,  denote  the 
threshold  for  the  ktl\  cycle.  Given  the  convolution  kernel  (5.6)  which  is  implemented  by 
the  smoothing  network,  at  each  cycle,  the  attenuation  factors,  Gk  for  the  difference  signals 
corresponding  to  several  idealized  features  are  computed  as  a  function  of  smoothing  and 
given  in  Table  5.1. 

The  ideal  step  edge  refers  to  a  two-dimensional  feature  which  is  infinite  in  one  dimension 
but  has  an  abrupt  change  from  one  pixel  to  the  next  in  the  other  dimension.  The  ideal  line 
corresponds  to  back-to-back  step  edges  facing  in  opposite  directions  so  that  its  1-D  cross- 
section  resembles  that  of  the  impulse  in  Figure  5-1.  The  labels  ‘1-pixel  line’  and  ‘2-pixel 
line’  in  Table  5.1  refer  to  the  width  of  the  1-D  impulse.  Impulse  noise  is  a  local  abrupt 
change  in  brightness  which  is  finite  in  both  dimensions.  Here,  the  labels  ‘1-pixel  impulse’ 
and  ‘4-pixel  square  impulse’  refer  to  the  area  of  the  local  discontinuity.  Finally,  the  ideal 
ramp  is  an  feature  similar  to  a  step  edge,  but  for  which  the  change  in  brightness  occurs 
over  several  pixels  (in  this  case  3)  rather  than  abruptly.  Some  graphic  examples  of  these 
features  are  shown  in  Figure  5-5. 

We  also  distinguish  between  horizontal  features,  which  are  those  that  are  aligned  with 
the  rectangular  pixel  grid,  while  diagonal  ones  are  oriented  at  45°  with  respect  to  the  grid. 
It  can  be  seen  from  the  values  in  the  table  that  diagonal  features  are  attenuated  somewhat 
more  than  horizontal  ones  due  to  the  nature  of  the  smoothing  operator,  and  consequently, 
edges  aligned  with  the  grid  are  favored  over  skewed  edges.  An  isotropic  operator  could 
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be  implemented  with  a  liexagonally  connected  network.  However,  based  on  the  numerous 
simulations  performed  to  produce  edges  for  the  matching  algorithm,  the  added  complexity 
in  the  design  does  not  seem  to  be  warranted  by  the  slight  improvement  in  the  results  that 
an  isotropic  operator  would  provide. 

The  values  in  Table  5.1  can  be  used  as  a  guide  for  setting  the  parameters  for  more  general 
types  of  features.  As  a  specific  example,  suppose  we  want  to  retain  only  the  boundaries 
from  large  objects  in  the  image  and  remove  all  small  scale  features.  The  threshold  Tq  can 
be  set  as  a  function  of  the  contrast  in  the  image.  We  can  perform  5  smoothing  cycles  and 
set  Ts  =  .246ro.  Features  resembling  step  changes  in  brightness  which  passed  the  threshold 
at  fc  =  0  will  have  little  trouble  passing  the  test  at  k  =  5,  while  features  resembling  4-pixel 
square  impulses  will  need  to  have  an  original  difference  greater  than  4.2ro  in  order  to  pass. 
A  simpler  method  to  generate  all  the  thresholds  is  to  choose  one  idealized  feature  as  a  model 
and  to  compute 

Tk  =  GkjTo  (5.8) 

where  Gk,f  is  the  attenuation  factor  for  the  model  feature  at  the  kth  smoothing  cycle.  For 
the  previous  example,  the  model  used  was  the  horizontal  step  edge.  In  another  case  in 
which  we  only  want  to  eliminate  impulse  noise  while  retaining  thin  lines,  we  might  choose 
the  diagonal  2-pixel  line  as  a  model.  In  an  actual  implementation,  the  values  in  Table  5.1 
can  be  held  in  a  ROM  and  supplied  to  the  MSV  processor  at  each  smoothing  cycle. 

5.5  Simulations 

The  results  of  simulating  the  MSV  algorithm  on  four  images  are  shown  in  Figures  5-6 
and  5-7.  These  images  are  from  two  motion  sequences,  one  simulated  and  one  real,  which 
will  be  used  again  in  the  following  chapters  to  demonstrate  the  matching  procedure  and  the 
results  of  the  motion  algorithm. 

In  the  first  sequence,  Figure  5-6,  the  left  image  is  a  picture  of  a  poster  (of  Neil  Armstrong) 
taken  by  a  Panasonic  CCD  camera,  while  the  right  image  was  generated  by  a  computer- 
simulated  motion  applied  to  the  first.  The  reason  for  generating  simulated  motion  is  to 
be  able  to  test  the  results  of  the  motion  computation  against  known  values.  The  motion 
simulation  program  assumes  an  image  of  a  planar  surface  at  a  user-supplied  depth  and 
orientation.  The  focal  length,  principal  point,  and  x,y  pixel  spacing  are  input  to  the 
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program  to  compute  ray  directions  using  the  pinhole  camera  model.  For  the  astronaut 
images,  which  are  400x400  pixels,  the  focal  length  and  pixel  spacing  were  such  that  the 
effective  field  of  view,  measured  from  the  optical  axis,  was  55°.  In  the  right  image,  the 
surface  is  modeled  as  a  frontal  plane  (parallel  to  the  image  plane)  at  a  depth  of  10  (baseline) 
units,  and  the  motion  was  a  translation  of  1  unit  in  the  positive  x  direction  with  a  5°  rotation 
about  the  y  axis. 

In  the  second  sequence,  Figure  5-7,  both  images  were  taken  by  a  Cohu  digital  CCD 
camera  rigidly  mounted  on  a  movable  carriage  that  could  be  translated  along  a  fixed  rail. 
The  carriage  assembly  could  be  rotated  on  both  the  vertical  and  horizontal  axes  so  that 
the  camera  could  be  oriented  in  any  direction,  with  positional  accuracy  of  better  than  .1°, 
on  each  axis.  The  advantage  of  the  digital  camera  is  that  each  pixel  corresponds  exactly  to 
one  sensor  location,  and  there  is  no  frame  grabber  in  the  path  to  resample  and  resize  the 
data.  The  internal  calibration  matrix  is  thus  very  close  to  that  given  by  the  geometry  of  the 
image  sensor,  which  is  a  6.4mmx4.8mm  CCD  array  with  756  (horizontal)  and  484  (vertical) 
pixels.  A  4.8mm  lens  was  used  to  give  an  approximately  40°  field  of  view,  measured  from  the 
optical  axis.  The  motion  for  the  pair  of  images  shown  was  a  translation  in  the  x  direction 
followed  by  a  rotation  of  5°  about  the  I  axis.  It  should  be  emphasized,  however,  that  this 
corresponds  to  the  motion  of  the  camera  with  respect  to  the  motion  stage  coordinate  system 
and  not  with  respect  to  its  own  coordinate  system. 

In  applying  the  edge  detection  algorithm,  step  edge  models  were  used  for  both  sets  of 
images,  and  7  smoothing  cycles  were  applied.  The  results  are  shown  as  the  binary  images 
below  the  originals.  It  should  be  noted  that  the  apparent  thickness  of  the  edges  is  due  in 
part  to  the  method  of  marking  both  pixels  on  either  side  of  the  change  in  brightness,  and  in 
part  to  the  presence  of  many  brightness  gradations  (ramp-like  edges)  in  the  scenes.  We  will 
continue  with  these  same  image  sequences  in  the  following  chapters  for  testing  the  matching 
procedure  and  the  motion  algorithm. 
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Figure  5-6:  Simulated  motion  sequence:  Frontal  plane  at  10  (baseline)  units.  Motion  is 
given  by:  b  =  (1, 0,  0),  6  =  5°,  Q  =  (0, 1,0). 
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Figure  5-7:  Real  motion  sequence:  Motion  with  respect  to  motion  stage  coordinate  system 
(not  camera  system):  b  =  (1,  0, 0),  0  =  5°,  0  =  (0, 0, 1). 


Chapter  6 


The  Matching  Procedure 


There  are  two  important  aspects  to  the  problem  of  finding  the  set  of  point  correspon¬ 
dences  needed  for  computing  motion.  The  first  is  that  only  a  sparse  set  of  very  reliable 
matches  distributed  across  the  field  of  view  is  needed.  The  second  is  that,  since  there  is  no 
fixed  relation  between  the  relative  positions  of  the  corresponding  points  in  the  left  and  right 
images,  as  in  the  case  when  the  epipolar  geometry  is  known,  the  search  must  usually  be 
conducted  over  a  large  area.  Consequently,  the  matching  procedure  must  have  both  a  good 
detection  rate  as  well  as  a  very  low  false  alarm  rate  to  minimize  the  number  of  incorrect 
matches. 

In  the  first  part  of  this  chapter,  I  will  derive  the  basic  procedure  which  will  be  used  and 
show  how  thresholds  can  be  set  to  ensure  adequate  detection  and  false  alarm  rates.  I  will 
then  present  the  results  of  applying  the  procedure  to  the  edge  maps  from  the  sequences 
shown  in  the  previous  chapter. 

6.1  Finding  the  Best  Match 

Following  the  usual  procedure  for  block  matching,  we  divide  one  image  into  N,  possibly 
overlapping,  M  X  M  blocks.  Since  the  edge  maps  are  binary,  we  can  simplify  the  equations 
by  adopting  the  following  notation.  Let  i,  1  <  i  <  N  denote  the  ith  block  and  define 

P  =  total  number  of  pixels  in  each  block  = 

Bi  =  set  of  pixels  corresponding  to  an  edge  in  the  ith  block  of  the  left,  or  base, 
edge  map. 

Bi  =  the  set  complement  of  Bi  within  the  ith  block. 
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Sjk  =  set  of  edge  pixels  in  the  M  X  M  block  centered  at  coordinate  location  (j,  k) 
in  the  right,  or  second,  edge  map. 

Sjk  =  the  set  complement  of  Sjk- 


When  it  is  convenient,  the  lowercase  variables  s,  s,  b,  b  will  be  used  to  refer  to  individual 
pixels  within  the  four  sets. 

Using  this  notation,  similarity  measures  can  now  be  expressed  by  ordinary  set  operations. 
The  normalized  binary  correlation  function  is  given  by 

_  ^  II 

~  ^IlSiill  ■  lift 

and  the  absolute  value  of  difference  function  by 

yijk  =  ^  n  BiW  +  n  5i||)  (6.2) 

which  has  been  normalized  so  that  0  <  Vjjk  <  1.  Unlike  the  case  of  gray  level  images,  these 
measures  are  always  equivalent  for  binary  data  since  they  are  related  by  the  equality 

PVi,k  =  ||5,'^||  +  ||5,||  -  2(115,^-11  ■  (6.3) 

The  absolute  value  of  difference,  Vijk,  however,  is  simpler  to  compute  and  so  is  preferred 
over  the  correlation  function. 

The  most  likely  position  of  the  match  occurs  at  the  minimum  value  of  Vijk,  V* .  The 
decision  to  accept  or  reject  the  best  match  is  based  on  the  result  of  a  comparison 


f/*  <  r  (6.4) 

Equation  (6.4)  is  in  the  form  of  a  binary  hypothesis  test  that  selects  between  the  hypotheses 
Hq,  that  the  match  is  false,  and  Hi,  that  the  match  is  correct.  The  decision  threshold  r  is 
chosen  to  achieve  a  given  detection  or  false  alarm  rate.  Although  only  V*  must  satisfy  (6.4), 
any  offset  at  which  Vijk  <  t  should  be  considered  as  a  potential  match. 

Formally,  the  detection  rate  is  defined  as  the  probability  that  Vijk  will  be  below  the 
threshold  given  that  Hi  is  true. 
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Figure  6-1:  Threshold  selection  for  a  decision  rule  that  chooses  between  two  hypotheses 
Ho  and  Hi. 


Pd  =  PT(V,jk  <  t\Hi)  (6.5) 

while  the  false  alarm  rate  is  the  probability  that  the  test  will  be  passed  given  that  the  match 
is  incorrect  [61]. 


Pp  =  Pr(y,jfc  <  t\Ho)  (6.6) 

As  indicated  by  the  diagram  in  Figure  6-1,  it  is  not  necessary  to  explicitly  compute  Pd 
and  Pp  to  determine  r,  but  only  to  find  the  mean  and  variance  of  Vijk  under  the  hypotheses 
Ho  and  Hi.  The  Chebyshev  inequality  [87]  ensures  that 


and 


Pd 


1  -  PiiVijk  >  r\Hi)  >  1  - 


\MVijk\Hi) 


(6.7) 


Pp  =  PT{V,,k  <  r\Ho)  < 


Var(l/yfc|ffo) 

{T-mof 


(6.8) 


To  compute  we  assume  that  the  distribution  of  edge  pixels  within  the  two  blocks 
are  independent  since  they  correspond  to  different  features.  We  further  assume  that  the 
values  of  each  pixel  within  the  same  block  are  independent  and  identically  distributed  (i.i.d.) 
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Bernoulli  random  variables.  Although  the  values  of  neighboring  edge  pixels  are  certainly 
correlated,  the  assumption  of  independence  is  reasonable  over  the  entire  block  if  it  is  large 
enough.  Let  pt,  denote  the  proba.bility  that  any  pixel  in  the  block  from  the  base  image  is  a 
1,  and  let  denote  the  same  probability  for  any  pixel  in  the  block  from  the  second  image. 
Then 

PHo  =  m,k\Ho]  =  f>.(l  -  Pb)  +  Pbil  -  Ps)  (6.9) 

The  variance  is  found  by  rewriting  (6.2)  as 

V^Jk  =  ^(ll^.ll  +  11^*11  -2||5,, II  n  ||£,||)  (6.10) 

Then 

^  [Var(l|5>||)  +  Var  (ll^ill)  +  4Var(||%-||  n  ||J5,,||)  - 
4Cov(||5,,||,  ||5,fc||  n  ||5,||)  -  4Cov(||i?,,||,  y 
=  ^  [Ppsi'i-  -  Ps)  +  Pphi^  -  Pb)  +  4PpsPb{l  -  PsPb)- 

APpbPsil  -Ps)-  4PpsP6(  1-2^5)] 

=  ^  [Ps(l  -  Ps) +  P6(1 -p;,)  -  4psP6(l -Ps)(l -PJ,)]  (6.11) 

When  Hi  is  true,  the  distributions  of  edge  pixels  in  the  two  blocks  are  no  longer  inde¬ 
pendent.  Ideally,  they  should  be  identical,  but  due  to  the  presence  of  noise,  pHi  wiU  not  be 
zero.  If  we  assume  a  Bernoulli  noise  process,  n,  with  probability  p„  such  that 

s  =  bn  A  bn  (6.12) 

Then 

bs  A  bs  =  n  (6.13) 

and  hence 


PH,  =  E[V,,k\Hi] 


Pn 


(6.14) 
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4^  (6.15) 

One  of  the  difficulties  with  using  equation  (6.9)  to  determine  r  is  that  it  requires  knowl¬ 
edge  of  both  ps  and  ph-  These  can  be  estimated  from  a  local  analysis  of  the  two  edge  maps, 
however,  doing  so  greatly  complicates  the  procedure.  A  simpler  test  can  be  formulated  by 
counting  only  the  edge  pixels  in  each  of  the  N  blocks  from  the  base  edge  map.  Since  these 
blocks  do  not  change,  this  needs  to  be  done  only  once. 

Given  ||i?i||,  the  distribution  of  Vi-jk  under  the  null  hypothesis  changes  slightly.  We  find 
that 

PHomi  =  nVijklHo,  ||i?||]  =  P,  (l  -  +  M(1  _  (6.16) 

and 

=  Var[V4|i^o,  \\B\\]  =  (6.17) 

If  we  consider  only  blocks  for  which  ||5.i||/P  <  1/2  then,  since  ps  >  0,  it  is  always  the  case 
that 

E[V,,k\Ho,\\B\\]>^^  (6.18) 

and  hence  a  reasonable  test  can  be  formulated  as 

(0  <  o  <  1)  (6.19) 

subject  to 

/3pn  <  H  <  i,  (^  >  1)  (6.20) 

where  a  is  chosen  to  ensure  a  low  false  alarm  rate  and  f3  determines  the  detection  rate 
given  a  and  assuming  a  value  for  p„.  Equations  (6.14)-(6.17),  can  be  used  in  conjunction 
with  equations  (6.7)  and  (6.8)  to  set  values  for  a  and  (3.  In  the  simulations  of  the  matching 
procedure  which  have  been  performed,  typical  values  are  o  =  0.5  and  /3p„  =  0.15. 

Although  this  procedure  categorically  rejects  any  block  in  the  base  map  which  does  not 
first  satisfy  (6.20),  the  gain  in  simplicity,  which  directly  impacts  circuit  complexity,  is  worth 
the  loss  of  a  few  correspondence  points.  The  loss  due  to  the  upper  bound  of  1/2  in  (6.20) 
should  not  be  too  great  since,  on  average,  edges  cover  much  less  than  half  of  the  image.  The 
lower  bound  would  be  necessary  under  any  circumstances  to  avoid  trying  to  match  blocks 
containing  very  few  features. 
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6.2  Other  Tests 

The  previous  analysis  was  based  on  the  assumption  that  the  pattern  of  edges  in  the  block 
being  matched  was  unique.  When  this  assumption  is  true,  the  absolute  value  of  difference 
function  has  a  very  sharp  minimum  at  the  location  of  the  correct  match.  The  variance  of 
the  function,  as  given  by  equation  (6.17),  will  be  very  small  if  M  is  large  enough  since,  as 
is  easily  shown, 

<^Ho,\\B\\  <  (6-21) 

If  M  =  24,  for  example,  then  f^Fo,||B||  -  It  is  not  difficult  to  choose  r  so  that  -  r ) 

is  many  times  larger  than 

It  is  often  the  case,  however,  that  the  edge  patterns  are  not  unique.  Repeating  patterns 
occur  frequently  in  natural  scenes.  The  most  common  example  is  when  the  block  contains 
linear  segments  that  are  part  of  larger  entities,  for  instance  the  side  of  a  door  or  of  a  table. 
Other  examples  occur  with  regular  structures  such  as  a  set  of  drawers,  or  bookshelves. 

The  only  practical  way  to  deal  with  the  problem  of  repeating  patterns  without  greatly 
increasing  the  complexity  of  the  procedure  is  to  simply  throw  out  any  matches  which  do 
not  have  a  single  well-localized  minimum.  Several  of  the  matching  procedures  discussed 
in  Chapter  3  impose  figural  or  continuity  constraints  to  disambiguate  multiple  responses. 
However,  these  methods  operate  in  software  and  can  therefore  consider  global  information. 
A  procedure  implemented  by  specialized  VLSI  circuits  can  use  only  local  information. 

Combining  the  test  for  a  localized  minimum  with  the  restrictions  (6.20)  on  the  fraction 
of  allowable  base  edge  pixels  and  with  the  threshold  test  (6.19),  only  a  relatively  small 
percentage  of  the  N  blocks  actually  generate  acceptable  matches.  It  is  therefore  important 
to  make  N  as  large  as  is  reasonably  possible  in  order  to  ensure  enough  correspondence 
points  are  found  to  obtain  a  good  estimate  of  the  motion.  It  should  be  noted,  however, 
that  in  spite  of  all  these  restrictions,  there  will  still  be  errors  that  cannot  be  avoided.  The 
purpose  of  the  tests  is  to  minimize  the  probability  of  these  errors  so  that  their  effect  on 
the  motion  estimates  is  minor.  Additional  steps  can  be  taken,  once  a  good  estimate  of 
the  epipolar  geometry  has  been  obtained,  to  remove  the  remaining  erroneous  matches  by 
identifying  points  which  are  significantly  off  the  epipolar  lines. 
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6.3  Simulations 

The  matching  procedure  has  been  tested  on  dozens  of  different  image  sequences.  The 
astronaut  and  lab  image  sequences,  shown  previously  in  Figures  5-6  and  5-7,  were  chosen 
to  illustrate  several  of  the  issues  which  have  been  raised  in  this  chapter. 

In  applying  the  matching  procedure  to  the  astronaut  sequence,  the  edge  map  from  the 
left  image  was  divided  into  400  blocks  whose  centers  were  regularly  spaced  in  a  20x20  array 
on  the  pixel  grid.  Each  block  measured  24x24  pixels,  and  the  search  in  the  right  image 
was  conducted  over  an  area  of  120x120  pixels  surrounding  the  coordinates  of  the  center 
of  each  block.  The  correspondences  which  were  found  are  numbered  sequentially  and  their 
locations  are  shown  superimposed  on  the  edge  maps  in  Figure  6-2.  The  numbered  locations 
of  the  correspondences  are  also  displayed  by  themselves  directly  below  the  edge  maps  to 
aid  the  reader  in  finding  them. 

Of  the  400  blocks,  135  passed  all  the  tests  and  generated  acceptable  matches.  The 
quality  of  the  matches  which  did  pass  the  tests  can  be  seen  to  be  quite  good.  They  are 
all  correct  to  within  possible  offset  error  caused  by  approximating  the  correspondence  at 
the  center  of  the  area  covered  by  the  block  in  the  second  image.  A  distinguishing  feature 
of  the  astronaut  images  is  the  absence  of  repeating  patterns.  In  fact  the  edge  maps  have 
almost  the  appearance  of  random  dot  images,  and  as  a  result,  few  blocks  were  rejected  for 
not  producing  a  well  locahzed  minimum.  The  vast  majority  of  those  which  were  rejected 
failed  either  the  threshold  test  (6.19)  or  did  not  have  the  edge  density  required  to  pass  the 
test  of  (6.20). 

The  second  sequence,  composed  of  images  taken  in  our  laboratory,  is  a  very  different 
situation.  Many  of  the  objects  in  the  scene,  i.e.,  the  bookshelves,  workstation  monitors,  and 
tripods,  have  long  linear  features  for  which  it  is  impossible  to  find  the  correct  match  with 
any  certainty  using  a  windowing  method.  There  are  also  regular  repeating  patterns,  such  as 
the  supports  on  the  bookshelves  and  the  drawer  handles,  which  generate  multiple  candidate 
matches  when  more  than  one  instance  is  included  in  the  search  window.  In  addition,  the 
motion,  which  includes  a  i  axis  rotation,  complicates  things  even  more  by  introducing  a 
relative  tilt  in  the  edge  patterns. 

The  matching  procedure  was  executed  on  these  images  by  dividing  the  left  image  into 
900  blocks  (in  a  30x30  array),  each  measuring  24x24  pixels.  The  search  was  conducted 
over  an  area  of  200  (horizontal)  x  60  (vertical)  pixels.  Of  the  900  blocks,  49  produced 
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Figure  6-2:  Binary  edge  maps  of  astronaut  sequence  with  correspondence  points  found  by 
the  matching  procedure. 
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Figure  6-3:  Binary  edge  maps  of  real  motion  sequence  (lab  scene)  with  correspondence 
points  found  by  the  matching  procedure. 
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acceptable  matches  according  to  the  different  tests  in  the  procedure.  These  are  shown  in 
Figure  6-3. 

As  in  the  astronaut  sequence,  most  of  these  matches  are  very  good.  However,  the 
proportion  of  blocks  generating  acceptable  matches  is  much  lower,  and  there  are  some 
obvious  errors,  such  as  points  13,  20,  32,  and  46.  The  first  three  of  these  points  are  simply 
weak  matches  that  passed  the  localization  test  only  because  they  had  a  single  minimum 
marginally  below  threshold,  while  the  other  minima  were  marginally  above  threshold.  The 
last  mismatched  point,  #46,  demonstrates  a  different,  but  frequently  encountered  problem. 
This  point  lies  near  the  border  of  the  left  image  on  the  lower  right  corner  of  the  workstation 
monitor  which  is  not  in  the  field  of  view  in  the  second  image.  If  the  full  monitor  had  been 
visible  in  the  second  image,  the  localization  test  would  have  rejected  the  match  since  there 
are  several  positions  where  a  low  score  could  be  obtained.  Instead,  however,  the  wrong 
match  was  accepted. 

Lowering  the  detection  threshold  is  not  a  good  solution  for  removing  the  marginal  cases 
which  slip  past  the  localization  test.  This  has  the  effect  only  of  reducing  the  total  number  of 
matches,  without  changing  the  fact  that  the  threshold  can  still  fall  in  between  the  minima  as 
in  the  cases  above.  In  fact,  there  is  not  a  simple  solution  at  this  level  for  removing  the  bad 
matches  which  escape  detection  without  compromising  the  generality  of  the  procedure.  In 
the  next  chapter  we  will  see  how  these  false  matches  affect  the  computed  motion  estimates. 
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Solving  the  Motion  Equations 


As  previously  discussed  in  Section  2.2,  the  basic  procedure  for  computing  general  camera 
motion,  or  relative  orientation,  given  a.  set  of  point  correspondences  {(r^,  ^i)},  i  =  1, . . . ,  iV, 
is  to  find  the  rotation  and  baseline  direction  which  minimize  the  sum  of  squared  errors 

s  =  (7-1) 

?:=i 

where  X{  is  the  triple  product  given  in  equation  (2.34)  as 


Xi  =  •  (ri  X  b)  (7.2) 

Since  the  measurements  of  the  locations  of  the  correspondence  points  are  not  always  equally 
reliable,  it  is  often  appropriate  to  define  S  as  the  weighted  sum 

N 

S  =  (7.3) 

i—1 

where  the  weights  {wj},  0  <  rci  <  1,  reflect  the  relative  confidences  in  the  data. 

The  methods  presented  in  Section  2.2  for  minimizing  (7.3)  were  developed  to  be  exe¬ 
cuted  on  powerful  digital  computers  where  memory  and  power  consumption  limitations  are 
not  significant  constraints.  In  this  chapter  a  simplified  algorithm  which  is  much  more  suit¬ 
able  for  implementation  on  low-level  hardware,  such  as  a  programmable  microcontroller,  is 
presented.  The  algorithm,  which  is  based  on  an  adaptation  of  Horn’s  second  method  [4], 
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is  also  an  iterative  nonlinear  constrained  minimization  procedure.  However,  it  breaks  up 
the  problem  by  alternating  between  updating  the  rotation  and  baseline,  and  in  doing  so, 
considerably  reduces  the  size  and  complexity  of  the  operations.  The  largest  matrix  which 
must  be  handled  is  4x4,  and  the  most  complex  operation  at  each  iteration  is  solving  a  3x3 
eigenvalue-eigenvector  problem. 

It  is  not  sufficient  to  present  a  method  for  computing  camera  motion  without  discussing 
problems  of  stability.  There  are  several  well  known  and  analyzed  cases  in  which  the  mini¬ 
mization  problem  is  numerically  unstable  and  which  allow  multiple  solutions  for  the  motion 
parameters.  If  we  are  going  to  build  a.  robust  system,  we  must  be  able  to  recognize  and 
avoid  these  cases.  In  the  next  chapter,  I  will  derive  analytically  the  conditions  for  the 
function  5  to  have  more  than  one  local  minimum — even  with  (almost)  perfect  data. — and 
will  develop  a  test  for  determining  when  the  solution  found  by  the  algorithm  is  indeed  a 
reliable  estimate  of  the  true  motion.  In  this  chapter,  I  will  merely  introduce  the  subject  of 
instability  and  multiple  solutions  and  will  present  without  proof  the  test  for  determining 
reliability.  In  the  last  section,  I  will  present  results  which  demonstrate  both  correct  and 
incorrect  convergence  of  the  algorithm  using  the  data  from  the  astronaut  and  lab  image 
sequences  given  in  the  preceeding  chapters. 

7.1  The  Simplified  Algorithm 

In  Horn’s  method,  which  was  briefly  discussed  in  Section  2.2,  the  rotation,  represented 

o  o 

by  the  unit  quaternion  q,  and  the  baseline,  represented  indirectly  by  the  quaternion  d  =  bq, 
were  updated  simultaneously.  This  resulted  in  an  11x11  system  of  linear  equations  to  be 
solved  at  each  iteration  in  which  three  of  the  unknowns  were  the  Lagrange  multipliers  from 
the  constraint  terms. 

If,  however,  the  motion  is  a  pure  rotation,  or  if  either  the  baseline  or  the  rotation  is 
known,  the  problem  becomes  much  easier.  The  simplified  algorithm  is  based  on  the  fact  that 
by  alternately  solving  these  easier  subproblems,  assuming  the  values  for  q  and  b  from  the 
previous  iteration,  the  estimates  of  the  motion  parameters  will  converge  to  those  obtained 
from  the  more  complex  method  in  which  q  and  b  are  updated  simultaneously. 

In  this  section,  I  will  first  present  the  procedures  for  solving  the  special  cases  and  then 
combine  these  into  a  complete  algorithm.  Note  that  in  the  following  derivations,  the  weights 
Wi  are  assumed  to  be  constant. 
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7.1.1  Pure  translation  or  known  rotation 

The  triple  product  Xi.  may  be  written  as 

Xi  =  h-  {£',  X  r,)  (7.4) 

where  £'i  denotes  the  ith  left  ray  rotated  into  the  right  coordinate  system.  Let 

Ci  =  i'i  X  Ti  (7-b) 


The  weighted  sum  of  scjuares  can  then  be  expressed  as 

N  N 

i=l 


=  b^Cb 


(7.6) 


If  the  rotation  is  given,  or  assumed,  C  may  be  treated  as  a  constant  matrix.  It  is  a 
straightforward  result  from  linear  algebra  that  C,  being  the  sum  of  the  dyadic  products  CicJ, 
is  symmetric  and  either  positive  definite,  or  at  least,  positive  semi-definite.  The  unit  vector 
b  which  minimizes  5  is  the  eigenvector  of  C  corresponding  to  its  smallest  eigenvalue  [3]. 


7.1.2  Rotation  with  known  translation 

There  is  not  an  equivalent  closed  form  solution,  such  as  the  one  above,  for  finding  the 
rotation  with  b  given,  or  assumed.  It  is  possible,  nonetheless,  to  solve  the  minimization 
problem  by  means  of  a  simple  iterative  procedure  starting  from  an  initial  guess,  Rq,  for  the 
rotation.  At  each  iteration,  k,  we  compute  the  incremental  adjustment  to  the  rotation,  ^R, 
such  that 

Rfc+i  =  8K  •  Rfc  (7.7) 

and 

5(Rfc+i)  <  S{Kk).  (7.8) 

The  procedure  stops  when  the  relative  decrease  in  S  is  smaller  than  a  given  tolerance. 
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The  incremental  adjustment  rotates  each  of  the  rays  through  an  additional  angle  S0 
about  an  axis  fj.  From  Rodrigues’  formula,  equation  (2.19),  the  new  ray  directions  are  given 

by 

SKi'i  -  £'i  +  sh\Sd(r]  x  ti)  +  (1  -  cos69)rj  X  (fj  x  £'i)  (7.9) 

which  can  be  approximated  to  first  order  in  SO  by 

SR£',^£',  +  S0{r]xe'i)  (7.10) 

The  error  at  the  end  of  iteration  k  is  therefore 

Kk+i  =  b  •  {SR£'i  X  r,) 


=  Xi^k  +  SO({hxri)x£'i)-ri 

(7.11) 

Let 

a 

q  =  (b  X  Tj)  X  £'i  and  m  =  -SO  fj 

(7.12) 

so  that 

Ai,yt+i  =  Xi,k 

(7.13) 

The  total  error  is  then 

N 

Sk+i 

=  {^Ik  -  2Ae,fc  ajm  +  {blJ mf) 

=  S'k  —  2h^m  +  m^Am 

(7.14) 

where  we  have  defined 

N 

h  =  ^  WiXi^k 

(7.15) 

and 

N 

A  =  '^Wi  a,  af 

(7.16) 

2  =  1 

Except  for  pathological 

cases,  i.e.,  when  the  field  of  view  is  zero. 

or  A"  <  3,  A  will  be 

invertible  and  positive  definite.  Accordingly,  equation  (7.14)  posesses  a  unique  minimum 
when 


m  =  A 


(7.17) 
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and  hence  the  60  and  rj 

which  minimize  S  to  first  order  are  given  by 

Qb 

1! 

I 

(7.18) 

and 

fj  =  — m 

(7.19) 

In  order  to  preserve  orthonormality,  should  be  computed  exactly  from  69  and  r) 

using  Rodrigues’  formula,  equation  (2.17),  without  approximation.  Alternatively,  one  can 
maintain  the  rotation  in  unit  quaternion  form  by  computing 

/  60  ^  .  60\ 

6q  =  (cos  —  ,  7/  sm  —  1 

(7.20) 

and 

Ak+i  = 

(7.21) 

The  rules  for  transforming  between  unit  quaternions  and  orthonormal  matrices  are  given  in 
Appendix  A. 

7.1.3  Pure  rotation  (|b|  =  0) 

If  the  motion  is  a  pure  rotation,  the  procedure  just  described  will  still  work  given  an 
arbitrary  value  for  b,  but  there  is  a  simpler  closed  form  method  which  can  be  applied. 
When  |bl  =  0  we  have,  going  back  to  the  notation  of  equation  (2.4)  in  Chapter  2, 


Pn  =  Rpii 


(7.22) 


By  the  length  preserving  property  of  rotations, 

IPril  =  |p/d 


If  spherical  projection  (2.15)  is  used  so  that 


IPriV 


and,  £i 


Pii 

\Pei\ 


(7.23) 


r;  = 


(7.24) 
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it  will  also  be  true  that 

r,  =  =  t,  (7.25) 

There  are  two  possibilities  for  finding  the  rotation  that  best  satisfies  (7.25)  in  a  least 
squares  sense  for  the  N  correspondence  points.  The  first  is  to  define  the  error 

e,  =  r,  X  e'i  (7.26) 

and  minimize 

s  =  'Y^Wi€-  (7.27) 

i=\ 

This  formulation  leads  directly  to  a  procedure  similar  to  that  defined  previously  for  the  case 
of  known  translation. 

The  second  method  is  to  note  that  ideally 

(7.28) 


so  that  we  can  also  solve  for  the  rotation  which  maximizes 

N 

=  (7.29) 

t=i 

This  formulation  was  previously  used  by  Horn  in  an  algorithm  to  compute  absolute 
orientation^  [88].  Writing  (7.29)  with  the  rotation  expressed  by  unit  quaternions  we  have 

N 

s'  =  ^rj7,(fi  •  q4q*) 

1  =  1 
N 

This  expression  can  be  cast  into  a  more  convenient  form  by  introducing  quaternion  matrices. 


^The  difference  between  absolute  and  relative  orientation  is  that  in  the  former  the  distances  to  objects  in 
the  scene  are  known.  Consequently  one  can  vectorially  subtract  the  translation  once  it  has  been  computed 
to  arrive  at  the  pure  rotation  case.  Note  that  this  cannot  be  done  for  relative  orientation  since  absolute 
distances  are  not  known.  Hence  this  method  is  only  applicable  if  in  fact  the  motion  is  a  pure  rotation. 
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As  shown  in  Appendix  A,  if  a  and  b  are  two  quaternions  then 

ab  =  =  Bk  (7.31) 

A  is  referred  to  as  the  left  quaternion  matrix  associated  with  a,  and  B  is  referred  to  as  the 
right  quaternion  matrix  associated  with  b.  Using  (7.31)  S'  can  be  rewritten  as 

N 

S'  = 

N 

-  (7.32) 

i=l 

where  M*-  =  'H-iCi-  The  minus  sign  arises  from  the  fact  that  —  — T?.*.  We  can  remove 
q  from  the  summation  to  obtain 

/  N 

S'  =  -qT 

\i=i 

=  — q^Mq  (7.33) 

using  M  to  represent  the  sum  in  parentheses. 

S'  is  thus  maximized  by  identifying  q  with  the  eigenvector  of  M  corresponding  to  its 
most  negative  eigenvalue.  Since  M  is  a  4x4  matrix,  there  is  in  principle  a  closed  form 
solution  for  q,  although  it  may  be  simpler  to  obtain  the  result  by  a  standard  iterative 
procedure. 

7.1.4  The  complete  algorithm 

Combining  the  procedures  for  the  special  cases  we  can  formulate  an  algorithm  to  solve 
for  the  general  case  of  unknown  translation  and  rotation  as  follows: 

Input:  q^°\  data 
if  b  =  0 

q  =  PuRE_RoTATE(data) 

else  { 

=  0 
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change  —  1 
while  (change  >  e)  { 

qU+U  -  UPDATE_Q(q(^'*,b(^'\data) 

=  UPDATE_B(q^^'+^\  data) 

change  = 
k  A 

} 


Pure_Rotate(),  Update_b(),  and  Update_q()  correspond  to  the  procedures  described 
in  7.1.1,  7.1.3,  and  7.1.2,  respectively.  It  is  necessary  to  start  the  algorithm  with  initial 
values  for  b  and  q.  These  can  be  provided  either  externally  or  by  obtaining  q*'’^  from 
Pure_Rotate()  and  bf°*  from  Update_b()  with  q  set  to  (1,0, 0,0).  It  is  easily  seen  that 
the  weighted  sum  of  squares,  monotonicaJly  decreases  with  each  iteration  since  it  de¬ 
creases  at  each  step.  Since  S  is  bounded  below  by  zero,  the  algorithm  will  converge  to  some 
local  minimum  or  stationary  point. 


7.2  Ambiguities  and  Multiple  Solutions 

There  are  four  fundamental  ambiguities  associated  with  any  pair  (q,  b)  which  minimize 
the  weighted  sum  of  squares  5.  Since  the  equations  involve  only  quadratic  forms,  S  is 
unchanged  by  multiplying  either  q  or  b  by  —  1.  The  solution  -q  is  trivial  since  it  corresponds 
to  the  same  rotation  as  q.  Changing  the  sign  of  b,  however,  reverses  the  direction  of  the 
baseline  which  also  affects  the  sign  of  the  Z  coordinates  computed  for  objects  in  the  scene. 

A  more  subtle  ambiguity  occurs  by  imposing  an  additional  rotation  of  tt  radians  about 
the  baseline  which  is  equivalent  to  replacing  q  by  d  =  bq.  We  previously  derived  in  equa¬ 
tion  (2.39)  that 

Ai  =  fibq-qii  (7.34) 

and  it  is  easily  verified  from  the  identities  of  Appendix  A  that  replacing  q  by  bq  results  in 

fibbq-bq^-  =  -iy-q-bq^i 
=  -Ti  ■  b  I'i 
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=  -A. 


(7.35) 


and  hence  will  give  the  same  total  squared  error. 

For  any  solution,  (q^h),  therefore,  (q,  — b),  (d,b),  (d,  — b),  are  also  solutions.  Each  is 
derivable  from  the  others,  however,  and  only  one  should  be  feasible  given  the  constraint 
that  the  imaged  points  are  visible  to  both  cameras.  It  is  conventional  therefore  to  count 
these  four  solutions  as  one  [7]. 

If  the  data  are  error-free,  a  finite  number  of  solutions  exist  to  the  non  least-squares 
problem  of  solving  A,-  =  0  for  all  N  if  there  are  at  least  five  distinct  ray  pairs.  Faugeras 
and  Maybank  [7]  showed  that  in  general  there  are  10  solutions  to  the  five-point  problem. 
If  at  least  eight  pairs  are  available,  the  solution  is  unique  for  most  configurations  of  points. 
However,  as  first  shown  by  Tsai  and  Huang  [28]  and  Longuet-Higgins  [8],  there  are  con¬ 
figurations  for  which  multiple  solutions  exist.  Horn  showed  that  only  hyperboloids  of  one 
sheet  and  their  degenerate  forms  viewed  from  a  point  on  their  surface  could  allow  multiple 
interpretations  [89],  while  Negahdaripour  further  demonstrated  that  only  certain  types  of 
hyperboloids  of  one  sheet  and  their  degeneracies  can  result  in  an  ambiguity,  and  in  these 
cases,  there  are  at  most  three  possible  solutions  [40]. 

A  more  important  concern  for  the  present  system  is  the  fact  that  the  function  5  may 
contain  multiple  local  minima  into  which  the  algorithm  can  be  trapped.  The  surfaces  which 
give  rise  to  multiple  solutions  of  the  equation  5  =  0  are  rarely  encountered  in  practice,  and 
are  even  less  likely  to  arise  by  chance  due  to  errors  in  the  matching  process.  However,  as  will 
be  demonstrated  analytically  in  the  next  chapter,  many  environments  can,  in  a  statistical 
sense,  have  a  depth  distribution  which  mimics  the  effects  of  those  of  the  special  surfaces, 
and  can  thus  generate  multiple  local  minima. 

The  conditions  under  which  multiple  solutions  to  the  minimization  problem  most  fre¬ 
quently  arise  are  well  known  to  be  a  function  of  the  type  of  motion.  Daniilidis  and  Nagel  [6] 
derived  analytically  the  conditions  for  instability  in  the  case  of  pure  translational  motion 
or  of  translation  with  known  rotation.  They  found  that  the  extreme  case  occurs  when  the 
translation  vector  is  parallel  to  the  image  plane  and  is  accentuated  as  the  field  of  view 
narrows.  They  as  well  as  others  (Spetsakis  and  Aloimonos  [12],  Horn  [3],  Weng  et  al.  [90]) 
have  proposed  changing  the  error  norm  that  is  minimized  to  weight  only  the  perpendicular 
distance  of  a  point  from  its  epipolar  line  in  order  to  reduce  the  chance  of  convergence  to  an 
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alternate  minimum.  Horn  derived  the  symmetric  weighting  term  in  [3]  as 

|ri  X 

“  ((b  X  r,)  ■  (r,  X  +  ((b  X  UQ  ■  (r,  x 


(7.36) 


where  Ur,  and  represent  the  variances  of  the  right  and  left  rays  in  the  ith  measurement, 
and  o-Q  is  an  arbitrary  constant  which  is  included  to  maintain  consistency  in  the  units. 

One  can  gain  a  better  understanding  of  equation  (7.36)  by  going  back  to  the  rigid  body- 
motion  equation  which,  even  with  imperfect  data,  must  be  approximately  satisfied 


p,.^  Rp;^  +  b 


Writing  r,  =  Zr,p,-,:  and  C  =  .^q  p;*-,  we  can  see  that 


b  X  r.  Si  X  i'i),  and,  b  X  i'i  k,  Znivi  X  i'i 


and  hence  equation  (7.36)  can  be  written  as 


(7.37) 


(7.38) 


_  _ 

|r,xC.p(z|Jr,|%2  +Z,qC,.|V|,) 

_ _ 

|ri|2|£,|2sin2a,  (z|Jr,|2(72  +  Z2J£,|2a|J 


where  a;  is  the  angle  between  and  1 1. 

For  rays  corresponding  to  points  approaching  infinity,  Zr,  ~  Z^,  and  a,-  ^  0  as  1/Z, 
resulting  in 

w.  ...  (7.40) 


Ir.PIAf  (k.lHt  +  K.IVJ 


However,  for  points  near  the  cameras,  assuming  their  relative  angle  of  rotation  is  <  90°,  a*- 
becomes  larger  as  Z^  and  Zi^  go  to  zero,  so  that  in  the  limit,  Wj  oo. 

Equation  (7.36)  thus  correctly  weights  rays  corresponding  to  points  closer  to  the  cameras 
more  strongly  than  those  corresponding  to  far  away  points.  Uirfortunately,  it  is  necessary  to 
know  either  the  epipolar  geometry  or  the  Z  coordinates  of  the  matched  points  in  advance  in 
order  to  use  this  equation.  If  the  Wi  are  computed  from  the  current  estimate  of  the  baseline 
and  rotation,  then  it  is  not  possible  to  prove  that  the  algorithm  will  converge  to  an  unbiased 
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estimate  of  the  correct  solution. 

In  an  automated  system  for  computing  motion,  it  is  more  important  to  be  able  to  identify 
when  the  algorithm  has  converged  to  the  wrong  stationary  point  than  it  is  to  try  to  ensure 
that  it  never  does.  The  numerical  instability  of  the  algorithm  in  the  case  of  translation 
parallel  to  the  image  plane  is  inherent  and  cannot  be  removed  without  prior  knowledge  of 
the  motion.  Nonetheless,  we  can  often  obtain  useful  estimates  of  the  motion,  even  under 
these  conditions,  when  it  can  be  determined  that  the  algorithm  has  converged  to  the  local 
minimum  closest  to  the  true  solution.  In  the  next  chapter,  I  will  show  that  the  most 
rehable  indicator  of  correct  convergence  is  the  ratio  H2I where  112  and  /is  are  the  middle 
and  largest  eigenvalues  of  the  matrix  C  defined  in  equation  (7.6).  This  ratio  theoretically 
depends  on  the  orientation  of  b  with  respect  to  the  vector  fis  =  Rl.  For  translation  parallel 
to  the  image  plane,  and  therefore  approximately  perpendicular  to  us,  H2/lJ-3  is  an  increasing 
function  of  the  field  of  view  and  will  be  <C  1  for  most  practical  imaging  systems.  When  the 
translation  is  parallel  to  V3,  however,  iJ.2/1^3  ^  1-  We  can  thus  determine  if  the  algorithm 
has  converged  to  the  correct  estimate  by  comparing  the  actual  ratio  to  the  one  predicted 
from  the  values  of  b  and  R  retuned  by  the  algorithm.  If  the  actual  ratio  is  small  compared 
to  its  predicted  value,  we  can  reject  the  solution  as  unreliable  and  proceed  to  the  next  set 
of  images. 

Once  it  has  been  determined  that  the  computed  motion  is  reliable,  a  variety  of  techniques 
may  be  used  to  improve  the  estimates.  For  example,  we  can  execute  the  algorithm  several 
times  to  remove  outliers,  i.e.,  correspondence  pairs  for  which  lAjj  is  much  greater  than  the 
average,  or  to  apply  weighting  factors  such  as  given  by  equation  (7.36)  using  the  previous 
estimates  of  b  and  q. 

7.3  Simulations 

The  results  of  applying  the  simplified  algorithm  to  the  astronaut  and  lab  sequences  pre¬ 
viously  seen  in  Figures  5-6  and  5-7  are  given  in  Tables  7. 1-7.3.  In  the  astronaut  sequence, 
which  was  generated  by  software,  the  motion  with  respect  to  the  origin  of  the  camera 
coordinate  system  is  known  exactly,  while  for  the  lab  sequence,  it  is  only  known  approxi¬ 
mately.  Both  sequences,  however,  correspond  approximately  to  the  classically  unstable  case 
of  translation  perpendicular  to  Ri. 

The  correspondence  points  used  to  compute  the  motion  for  the  astronaut  images  are 
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b 

e 

to 

5 

/U2//U3 

actual  1  pred. 

True  motion 

(1.0, 0.0, 0.0) 

5° 

(0.0,1.0,0.0) 

_ 1 

1 

First  Test.  Initial  values  from  estimates  of  pure  translation  and  rotation 

Initial  values: 

(.9982, -.0129, -.0591) 

9.72° 

(-.0045,. 9998, -.0200) 

7.7e-4 

First  pass 

(.9998, -.0164, -.0120) 

4.95° 

(-.0132,. 9999, -.0011) 

6.0e-6 

.418 

.475 

Second  pass 

(.9999, -.0133, -.0083) 

4.99° 

(-.0106,  .9999, 0.00) 

3.4e-6 

.441 

.475 

Second  Test.  Initial  values  chosen  to  result  in  alternate  local  minimum 

Initial  values: 

(0.0, 0.0, 1.0) 

5° 

(0.0, 1.0, 0.0) 

4.9e-3 

First  pass 

(.0481,. 0093,  .9988) 

10.66° 

(-.0016, 1.0000, -.0036) 

7.1e-6 

.586 

.979 

Second  pass 

(.0403, -.0040,  .9992) 

10.63° 

(-.0012, 1.0000, -.0004) 

3.1e-7 

.471 

.977 

Table  7.1:  Simulation  results  on  the  astronaut  sequence. 


those  shown  in  Figure  6-2  which  were  generated  by  the  matching  procedure.  As  was  previ¬ 
ously  noted,  the  quality  of  these  matches  is  very  good,  and  this  is  reflected  in  the  closeness 
of  the  calculated  and  the  true  motion.  Table  7.1  shows  the  results  of  two  tests.  In  the  first, 
initial  values  were  chosen  by  computing  the  best  pure  translation  and  best  pure  rotation 
which  fit  the  data.  In  order  to  improve  the  estimates,  two  passes  of  the  algorithm  were 
performed,  with  the  first  to  obtain  an  initial  estimate  and  identify  outliers  to  be  removed. 
As  can  be  seen,  there  is  very  little  difference  in  the  solutions  computed  in  the  two  passes 
for  the  astronaut  sequence.  Of  the  135  correspondence  points,  only  19  were  found  to  have 
an  error  greater  than  one  standard  deviation  above  the  mean.  The  reliability  of  the  results 
is  also  evidenced  by  the  closeness  of  the  actual  and  predicted  values  for  the  ratio 
given  that  the  field  of  view  for  this  simulated  sequence  was  set  at  ~  55°. 

TDespite  the  fact  that  the  field  of  view  is  relatively  large  and  there  are  many  corre¬ 
spondence  points,  it  is  nonetheless  possible  to  make  the  algorithm  converge  to  another  local 
minimum.  Starting  the  algorithm  with  an  initial  value  of  b  =  i,  shown  as  the  second  test 
in  Table  7.1,  the  result  is  quite  far  from  the  known  solution.  Removing  outliers  from  the 
data  does  not  prevent  the  algorithm  from  converging  to  this  incorrect  result,  and  nor  does 
any  other  weighting  scheme.  It  is  interesting  to  note  that  the  value  of  S  =  S/N  is  of  no  use 
in  discriminating  between  the  correct  and  incorrect  solutions  since  it  is  very  small  in  both 
cases.  The  actual  and  predicted  values  of  112/1^-3,  however,  do  show  the  difference.  In  the 
second  test  they  are  .586  and  .979  in  the  first  pass  and  .471  and  .977  in  the  second.  The 
fact  that  the  predicted  ratio  is  significantly  different  from  the  actual  value  indicates  that 
these  results  are  unreliable. 
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b 

e 

Uf 

S 

actual  1  pred. 

Motion  of  stage 

(1.0, 0.0, 0.0) 

__  5° 

(0.0, 0.0, -1.0) 

1 

First  Test.  Initial  values  from  estimates  of  pure  translation  and  rotation 

Initial  values: 

(.5867, -.0878,  .80.50) 

7.43° 

(.0278,  .8102, -.5855) 

7.06e-5 

First  pass 

(.9937, .0007,  .1120) 

5.04° 

(.1286, -.3815, -.91.54) 

4.09e-5 

.027 

.084 

Second  pass, 

(rejects:  20,  39,  43) 

(.0247, -.1316,  .9910) 

7.86° 

(.0387,.8559,-.5156) 

1.84e-5 

.045 

.776 

Second  Test.  Initial  values  chosen  to  converge  to  estimate  of  correct  motion 

Initial  values: 

(.9990,. 0316,  .0316) 

7.43° 

(.0278,.8102,-.5855) 

1.21e-4 

First  pass 

(.9961, -.0543, .0703) 

5.12° 

(.0207, -.3750, -.9268) 

3.41e-5 

.027 

.083 

Second  pass, 

(rejects:  39,  43) 

(.9972, -.0513, .0552) 

5.88° 

(.0137, -.5309. -.8474) 

3.91e-6 

.026 

.083 

Third  Test.  Same  initial  values  as  2nd  test,  but  points  13,  20,  32,  4^  removed  by  hand. 

Initial  values: 

(.9990,. 0316,  .0316) 

7.49° 

(.0274, .8184, -.5740) 

1.02e-4 

First  pass 

(.9940,-.0085,.1095) 

4.84° 

(.1126, -.2912, -.9500) 

3.67e-5 

.028 

.084 

Second  pass, 

(I'ejects:  39,  43) 

(.9961, -.00.56,  .0885) 

5.74° 

(.1026, -.5122, -.8527) 

4.09e-6 

.026 

.083 

Table  7.2:  Simulation  results  on  the  lab  sequence  with  points  from  automatic  matching 
procedure. 


b 

e 

w 

S 

actual  1  pred. 

Motion  of  stage 

(1.0, 0.0, 0.0) 

5° 

(0.0, 0.0, -1.0) 

1 

First  Test.  Initial  values  from  estimates  of  pure  translation  and  rotation 

Initial  values: 

(.9915, -.0656, -.1126) 

7.69° 

(-.0019,.8193,-.5733) 

1.29e-4 

First  pass 

(.9964,-.0808,.0262) 

6.07° 

( -.0484 ,  -  .5737,  -  .81 76) 

2.64e-6 

.067 

.087 

Second  pass, 

(rejects:  16,  17,  20) 

(.9964, -.0802,. 0287) 

6.0° 

(-.0453, -.5591, -.8278) 

1.55e-6 

.073 

.083 

Table  7.3:  Simulation  results  on  the  lab  sequence  with  hand-picked  correspondence  points. 


In  the  lab  sequence,  the  exact  motion  with  respect  to  the  camera  coordinate  system  is 
unknown  because  the  system  was  not  accurately  calibrated.  In  order  to  evaluate  the  quality 
of  the  correspondences  obtained  by  the  matching  procedure  against  a  known  standard,  we 
compare  the  results  for  the  automatic  data  with  those  from  a  second  set  of  correspondence 
points  chosen  by  hand.  To  estimate  the  motion  for  both  sets  of  data,  we  use  the  follow¬ 
ing  approximate  internal  calibration  matrix,  derived  from  the  manufacturer’s  data  on  the 
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camera,  and  lens  used  with  these  images. 


K, 


567 

0.0 

378 

0.0 

484 

242 

0.0 

0.0 

1.0 

(7.41) 


to  transform  between  image  plane  and  world  coordinates,  as  described  in  Section  2.1.1. 

The  correspondences  obtained  by  the  matchiirg  procedure  are  shown  in  Figure  6-3.  It 
was  previously  observed  that  four  of  the  points,  specifically  13,  20,  32,  and  46,  were  clearly 
wrong,  while  it  was  less  obvious  if  other  points  were  also  in  error.  Table  7.2  lists  the  results  of 
three  tests  conducted  on  the  data  from  the  matching  procedure.  In  the  first,  the  estimates  of 
pure  translation  and  pure  rotation  were  used  as  starting  values  for  the  algorithm.  Although 
the  initial  value  for  b:  (.5867, -.0878,  .8050)  is  quite  far  from  the  approximate  translation 
direction,  the  algorithm  does  converge  to  a  reasonably  close  solution  on  the  first  pass. 
On  the  second  pass,  however,  it  fa.lls  into  an  alternate  stationary  point — indicating  that 
removing  outliers  on  a  heuristic  basis  does  not  always  give  a  better  estimate.  The  actual 
and  predicted  values  of  IJ-2/T3  once  again  point  out  the  difference  in  the  two  results.  In  the 
first  we  have  .027  and  .084  for  the  actual  and  predicted  ratios,  computed  for  an  effective 
field  of  view  of  30°,  while  in  the  second  we  have  .045  and  .776,  indicating  an  unreliable 
result . 

In  the  second  test,  a  starting  value  of  b  =  (.9990,  .0316,  .0316)  was  used,  and  this  time 
the  algorithm  converged  to  a  reliable  solution  on  both  passes.  Interestingly,  none  of  the 
four  clearly  incorrect  matches  was  rejected  after  the  first  pass,  although  points  39  and  43, 
which  are  not  so  obviously  wrong,  were  rejected.  The  actual  and  predicted  ratios  of  ^2/1^3 
of  (.027,  .083)  on  the  first  pass,  and  (.026,  .083)  on  the  second,  are  very  close  and  indicate 
that  the  solutions  are  reliable.  Although  the  errors  for  points  13,  20,  32,  and  46  are  quite 
noticeable,  careful  examination  reveals  that  they  do  not  have  a  large  component  in  the 
direction  perpendicular  to  the  correct  epipolar  line,  while  points  39  and  43,  which  were 
rejected,  do.  In  the  third  test,  we  verify  directly  that  these  four  points  have  little  effect  on 
the  computed  motion  by  manually  removing  them  from  the  data  set.  As  seen  in  Table  7.2, 
the  results  of  this  test  are  almost  identical  to  those  of  the  previous  one. 

The  manually  chosen  correspondence  points  for  this  sequence  are  shown  in  Figure  7- 
1.  There  are  28  points  in  all  which  were  selected  using  a  high  resolution  display  and  a 
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Figure  7-1:  Binary  edge  maps  of  real  motion  sequence  with  hand  picked  point  matches. 


mouse-driven  pointer.  The  same  test,  in  which  the  motion  was  computed  using  the  initial 
values  derived  from  the  estimates  of  pure  translation  and  rotation,  that  was  performed 
on  the  points  found  by  the  matching  procedure  was  performed  on  these  data  with  the 
results  given  in  Table  7.3.  This  time  the  initial  estimate  of  the  translation  was  much 
closer  to  the  actual  value  and  the  algorithm  converged  to  the  correct  estimate  on  both 
passes.  As  can  be  seen  by  comparing  Tables  7.3  and  7.2,  the  estimates  computed  for  the 
manually  and  the  automatically  chosen  points  are  very  close:  b  =  (.9964, -.0802,  .0287), 
0  =  6.0°, d)  =  (  —  .0453,  —.5591,  —.8278)  for  the  manual  data  vs.  b  =  (.9972,  —.0513,  .0552), 
0  =  5.88°, cD  =  (.0137, -.5309, -.8474)  and  b  =  (.9961, -.0056,  .0885),  0  =  5.74°, di  = 
(.1026,  -.5122,  —.8527)  in  the  second  and  third  tests  with  the  automatic  data. 

Based  on  the  results  from  both  the  astronaut  and  lab  sequences,  we  can  thus  conclude 
that  the  data  obtained  by  the  matching  procedure  are  of  comparable  quality,  at  least  with 
respect  to  estimating  motion,  to  those  obtained  by  more  elaborate  methods. 


Chapter  8 


The  Effects  of  Measurement  Errors 


Understanding  the  effects  of  errors  in  the  data  on  the  estimated  motion  is  critical  in 
designing  an  automated  system  for  tasks  that  demand  high  reliability,  such  as  navigating  an 
autonomous  vehicle.  Previous  studies  on  the  effects  of  error,  however,  have  been  incomplete 
and  would  lead  one  to  believe  that  any  system  built  from  current  technologies  would  at  best 
give  poor  results. 

In  this  chapter,  I  wiU  analyze  in  detail  the  numerical  stability  of  the  motion  algorithm, 
which  affects  its  sensitivity  to  error,  and  derive  analytic  expressions  for  the  expected  esti¬ 
mation  error  in  the  case  of  both  random  and  systematic  errors  in  the  data.  I  will  also  derive 
the  conditions  under  which  the  algorithm  will  converge  to  an  alternate  local  minimum  and 
develop  the  theoretical  basis  of  the  ratio  test  to  determine  if  the  solution  reported  by  the 
algorithm  is  rehable.  In  the  last  section,  I  will  compare  the  theoretical  predictions  of  the 
first  part  of  the  chapter  wdth  the  results  from  simulations  on  data  from  artificially  generated 
motion  sequences  with  varying  amounts  of  added  error. 

Several  important  results  are  obtained  in  this  analysis.  The  first,  of  course,  is  the 
development  of  the  ratio  test  to  determine  reliability.  Just  as  important,  however,  is  the 
fact  that  the  error  analysis  provides  us  with  guidelines  for  designing  a  system  with  the 
required  sensor  resolution  and  field  of  view,  as  well  as  with  the  required  number  and  size 
of  matching  circuits,  to  obtain  a  given  maximum  expected  error  in  the  estimated  motion. 
Finally,  I  also  derive  an  interesting  practical  result  which  is  that  precise  internal  camera 
calibration  is  not  necessary  in  order  to  obtain  accurate  estimates  of  the  translation  direction, 
as  long  as  the  rotation  is  estimated  as  well. 
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Figure  8-1:  Geometry  of  the  position  vector  for  a  point  on  the  image  plane  and  cone  of 
all  vectors  for  a  given  field  of  view. 

8.1  Numerical  Stability 

In  order  to  understand  the  problems  of  numerical  instability,  we  must  have  analytical 
forms  for  the  matrices  C  and  A  which  are  used  in  the  procedures  to  update  b  and  q. 
Assuming  the  correspondence  points  are  distributed  uniformly  over  the  left  and  right  images, 
we  can  approximate  the  summation  of  equation  (7.3)  by  an  integration  over  the  field  of  view. 

S  =  Y^Wi\j — "  w{i,a)\'^idida  (8.1) 

t  =  l 

We  assume  that  the  correspondence  points  in  the  left  image  are  contained  within  a  circle 
of  radius  D  centered  about  the  point  (0, 0),  as  shown  in  Figure  8-1  and  use  the  homogeneous 
form  (2.14)  of  representing  ray  directions.  The  vector  i  from  the  center  of  projection  to  a 
point  (i^costt,  ^sina)  on  the  image  plane  is  thus 

^  cos  a 
^  sin  a 
1 

where  ^  G  [0,11]  and  a  G  [0,27r).  Since  we  have  implicitly  set  /  =  1,  the  viewing  angle  cf) 
is  computed  as 


(j)  =  tan  ^  D 


(8.3) 


CHAPTER  8.  THE  EFFECTS  OF  MEASUREMENT  ERRORS 


106 


The  factor  N/nD'^  in  equation  (8.1)  takes  into  account  the  number  of  correspondence 
points  by  representing  them  as  a  uniform  density  over  the  viewing  field.  It  is  not  appropriate 
to  make  the  assumption  that  N  scales  proportionally  to  the  viewing  area,  that  is  as 
because  the  number  of  pixels  on  the  sensor,  which  ultimately  limits  N,  is  constant.  If 
we  change  the  optics  on  the  imaging  system  to  give  a  wider  field  of  view,  the  number  of 
correspondences  will  not  increase  significantly,  the  points  will  simply  be  distributed  over  a 
larger  area. 

In  this  section  I  will  derive  the  analytical  forms  for  C  and  A  in  the  case  where  the  data 
are  error-free,  as  well  as  the  conditions  for  S  to  have  multiple  local  minima  corresponding 
to  feasible  solutions.  In  the  next  section  I  will  analyze  the  case  of  imperfect  data  and  study 
the  effects  of  error  on  the  reliability  of  the  motion  estimates. 


so  that 


Z^ri  =  Ze-i’i  +  b 


(8.8) 
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Taking  the  cross  product  of  both  sides  with  Hi  we  thus  have 

^'«xr,  =  ^(£',xb)  (8.9) 

Since  there  is  no  error  in  the  data,  we  set  w,-  =  1  and  write  C  as 


N 


C  =  X  r,)(£'i  X  r,)^ 

(8.10) 

=  E  ^(^',  X  b)(£',  X  b)T 

.  .,  //r- 

t=l 

(8.11) 

(8.12) 

where 

^  0  -b,  by  ^ 

Bx  =  b,  0  -b^  (8.13) 

y  by  bx  0  J 

We  now  make  the  approximation  that  the  correspondences  are  distributed  uniformly 
over  the  image  and  replace  the  summation  of  equation  (8.12)  by  the  integral 

The  distribution  of  depths,  Zr,  of  points  in  the  scene  is  of  course  unknown.  However, 
given  that  we  are  interested  only  in  analyzing  the  general  structure  of  the  matrix  C  for 
different  types  of  motion,  we  can  consider  Z,.  as  a  random  variable  whose  probability  distri¬ 
bution  is  independent  of  ^  and  a  and  can  therefore  replace  the  l/Z^  term  by  its  expected 
value  and  take  it  outside  the  integral.  We  then  have 

c  =  -kBx  jy  Bx  (8.15) 


where  k  has  been  defined  as 


(8.16) 


This  integral  is  now  in  the  form  of  equation  (B.3)  whose  solution  is  given  in  Appendix  B, 
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equation  (B.20).  The  result  is 


C  =  -kA'Bs 


(i  -  B> 


kN  —  -  bb"^  -  (b  X  U3)(b  X 


+  (b  X  U3)(b  X  U3)'^) 


(8.18) 


where  Vs  =  R-.S  represents  the  rotation  of  the  optical  axis  in  the  left  camera  system. 

By  inspection,  we  can  see  that  both  b  and  b  x  ha  are  eigenvectors  of  (8.18)  with  eigen¬ 
values  /Tjj  =  0  and 


=  fiiV  (^1  -  |b  X  +  |b  X 


kN  f  ^(b  •  h)^  +  |b  X  U3I 


Consequently  (b  x  Ds)  x  b  is  also  an  eigenvector  with  eigenvalue 


(8.19) 


(8.20) 


_  kND^ 

^^(hxV3)xh  -  1 


(8.21) 


Only  the  eigenvalue  of  b  X  1)3  depends  on  the  motion.  Its  extreme  values  are  obtained 
when  b  _L  U3  and  b  ||  hs.  If  b  ±  U3  we  ha.ve 

fh.;,.  =  -tW  (8.22) 


For  H  <  2,  or  a  viewing  angle  4>  <  63.4°,  this  will  be  the  largest  eigenvalue,  and  the  ratio 
of  the  second  largest  to  the  largest  eigenvalues  will  be 


4 


(8.23) 


The  numerical  stability  of  determining  the  translation  with  known  rotation  is  related 
to  this  ratio.  If  it  is  small  compared  to  zero — which  is  the  ratio  of  the  smallest  and  largest 
eigenvalues — adding  error  to  the  data  can  cause  the  two  smallest  eigenvalues  to  switch 
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places.  In  most  practical  situations  {>3  will  be  very  close  to  I.  From  equation  (2.16),  V3, 
which  is  the  third  column  of  the  rotation  matrix  R,  is  given  by 

^  Wj,W2(l  —  cos^)  +  Wy  sin0  '' 

V3  =  iVyW^(l  —  cos^)  —  Wj;  sin  ^  (8.24) 

^  cos  ^  +  (v?  ( 1  —  cos  0)  I 

If  0  =  0  or  di  =  i,  ha  is  identically  equal  to  5.  If  d>  7^  i  and  0  is  so  large  that  the 
approximation  cos0  ~  1  is  not  valid,  the  two  cameras  will  not  image  the  same  scene. 
The  unstable  case  thus  usually  occurs  when  the  motion  is  (nearly)  parallel  to  the  image 
plane  and  the  field  of  view  is  small,  as  reported  by  Daniilidis  and  Nagel  [6].  The  eigenvector 
corresponding  to  the  second  largest  eigenvalue  is  (b  x  V3)  x  b  =  ?}3.  If  the  standard  deviation 
of  the  error  in  the  data  is  greater  than  the  difference  between  the  two  smallest  eigenvalues, 
we  may  find  that  the  procedure  Update_b()  reports  a  translation  direction  which  is  close 
to  z  instead  of  to  .r.  This  is  the  instablility  which  is  most  commonly  observed  in  practice. 

The  other  extreme  case  corresponds  to  b  =  V3,  or  motion  (nearly)  parallel  to  the  optical 
axis.  In  this  case,  vs  is  an  eigenvector  and  so  is  any  vector  perpendicular  to  V3,  since  the 
direction  of  b  x  us  is  undefined.  The  two  largest  eigenvalues  are  equal 


kND^ 

f^2  =  H3  =  — ;; — 


(8.25) 


and  thus  the  estimation  of  the  translation  directioir  is  numerically  stable,  independently  of 
the  field  of  view. 


8.1.2  Condition  of  the  matrix  A 

The  condition  number  of  the  matrix  A,  A'^,  defined  as  the  ratio  of  its  largest  and 
smallest  eigenvalues,  is  a  measure  of  its  nearness  to  singularity.  Since  the  incremental 
update  to  the  rotation  requires  inverting  A,  Ka  is  the  critical  parameter  in  determining 
the  numerical  stability  of  this  procedure. 

In  equation  (7.16)  we  defined  A  as 

N 

A  =  Y^Wi 

i=l 


(8.26) 


CHAPTER  8.  THE  EFFECTS  OF  MEASUREMENT  ERRORS 


where 


a,-  =  (b  X  Fj)  X  Hi 


(8.27) 


As  before,  we  assume  that  the  data  are  error-free  so  that  Wi  =  1  and  we  can  write 


Z,.,r,  =  -b  b 


(8.28) 


Taking  the  cross  product  of  both  sides  with  b  we  ha.ve 


bxr*  =  :^(bx£',) 


and  hence 


(8.29) 


a,  =  ^(bx£',)x£'. 

^  T  i 


(8.30) 

(8.31) 


Substituting  the  above  expression  for  a;  into  equation  (8.26),  we  obtain 


i=i 


(b  ■  -  \c'i\\h  ■  e'o  {hPj  +  e',h'^)  +  \e'i\^hh'^ 


(8.32) 


If  the  distances  to  objects  in  the  scene  are  large  compared  to  the  baseline  length,  the 
terms  {ZijZr,)^  should  be  close  to  1.  In  any  case,  we  may  assume  they  are  random  variables 
which  are  independent  of  position  and  which  may  therefore  be  replaced  by  their  expected 


value.  Let 


')  =  E 


(8.33) 


Then,  making  the  approximation  that  the  correspondences  are  distributed  uniformly  over 
the  image,  equation  (8.32)  becomes 

A  =  ^  r  [(b  •  e'  fe'e'^  -  \£fih  ■  t)  (hU^  +  +  |.e'|^bb'^]  e  da  (8.34) 

■kD^  Jo  Jo  ^  ' 


Each  term  on  the  right-hand  side  of  this  expression  corresponds  to  one  of  the  special 
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integrals  computed  in  Appendix  B.  The  solutions  are  as  follows: 


Jo  Jo 

- 1^  ^•2ww’^  +  |w|^  +  2bb'’-  +  I 


+  7rT)^(b  •  vsf  vsvs'^  1 


where 


For  the  second  term: 


w  =  (vs  X  b)  X  U3  =  b 


(8.35) 


(8.36) 


\£f{h  •  £')  (b£'^  +  £'bT)  ^d^da  =  jj  \tf  (bb^^T^  +  fT'^bb^)  ^  d^  da 

=  ^  j  bb'^  +  ~ 


and  for  the  last  term: 


(8.37) 


n27r  /  n4  \ 

l^'l^bb^ (d^da  =  nD^n+D^  +  ^\  bb^ 


(8.38) 


Combining  (8.35),  (8.37),  and  (8.38)  and  skipping  much  of  the  messy  algebra,  we  obtain 


+  —  (l--^j(l-5(b-U3)^)U3t’3  +  + 


(8.39) 


w  •  (b  X  Us)  =  U3  •  (b  X  U3)  =  b  •  (b  X  ha )  =  0 
(b  X  ha)  is  clearly  an  eigenvector  of  A  with  eigenvalue 


(8.40) 


A'bxha  -  4  1  6  g  1  (b-^^s) 


(8.41) 
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The  other  two  eigenvectors  are  therefore  linear  combinations  of  b  and  ^3  but  do  not 
coincide  exactly  with  either  b  or  unless  b  T  D3  or  b  ||  1)3. 

If  b  ±  1)3,  then  w  =  b  and  A  becomes 


A  =  7A  1  +  —  +  —  bb^  +  — 1+  — 
’  \  2  12  24  4 


The  eigenvectors  and  eigenvalues  are  then,  in  increasing  order, 


b  X  ^3,  With  /Ibxbs  = 


V3,  with 


4 


(8.42) 


(8.43) 


(8.44) 


h,  with  =  7 A  f  1  +  —  +  —  j  (8.45) 

Note  that  this  order  holds  for  D  <  \/6,  or  for  a  viewing  angle  (f)  <  67.8°,  at  which  point 
^bxf’s  ~  condition  number  is  given  by 


_  Mb  _  24  / 


(8.46) 


As  T)  ^  0,  Kaj_  ^  00,  as  we  should  expect.  This  reflects  the  well-known  phenomenon, 
which  occurs  with  small  fields  of  view,  of  interference  between  the  displacement  patterns 
caused  by  a  rotational  motion  and  those  caused  by  a  translation  parallel  to  the  image 
plane.  As  D  — >  00,  I{aj_  decreases  and  eventually  goes  to  zero,  however  for  values  of 
D  which  exist  in  real  imaging  systems,  it  remains  quite  large.  For  example,  if  4>  <  60° 
(D^  <  3),  Ka±  >  8.67.  Hence  the  estimate  of  the  rotation  will  always  be  very  sensitive  to 
error  when  b  •  1)3  =  0. 

If  b  =  hs,  w  =  0,  and  A  becomes 


,  iND^  ^  _  T  4T>2  _ 

A  =  — - —  I  -  V3V3  +  V3V3 


(8.47) 
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Vs  is  an  eigenvector  and  so  is  any  vector  perpendicular  to  U3.  The  eigenvalues  are 


^^V3  = 


3 


(8.48) 


fix  = 


7A'T>2 


When  =  3/4,  or  cf)  —  36.9°,  the  eigenvalues  are  equal.  For  D  <  \/374 


(8.49) 


Ka..  = 


(8.50) 


while  forD  >  a/3/4 


^'-4,1  = 


MtTo  4D 


(8.51) 


/i/|l  thus  has  a  minimum  at  D  =  'y/3/4  and  goes  to  infinity  both  as  D  ^  0  and  as 
D  00.  For  60°  >  <f>  >  23.4°,  however,  <  4,  and  so  for  viewing  fields  used  in  most 
real  imaging  systems,  the  estimation  of  the  rotation  will  be  robust. 

8.1.3  Minimizers  of  S 

Using  the  results  derived  in  this  section,  we  can  now  determine  the  conditions  for  S  to 
have  more  than  one  stationary  point  corresponding  to  a  feasible,  non-trivial  solution.  As 
long  as  T)  /  0  and  the  motion  is  not  a  pure  rotation,  in  which  case  C  =  0,  the  smallest 
eigenvalue  of  C  given  the  true  rotation  is  unique,  and  hence  so  is  the  solution  for  the  baseline. 
Let  bo  and  Rq  denote  the  baseline  and  rotation  corresponding  to  the  actual  motion.  If  an 
alternate  minimum  of  ,5”  exists  corresponding  to  the  solutions  b'  and  R',  it  must  be  the  case 


R'/Ro 


C'b'  =  n'h' 


(8.52) 

(8.53) 


h'  =  0 


(8.54) 
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where  n'  is  the  smallest  eigenvalue  of  C'  and 

C'  =  X  r,)(n'  X  r,)T  (8.55) 

8  =  1 

with  £"i  =  H'li. 

We  can  always  write  R'  as  R'  =  •  Rq  so  that  £”i  -  SKti.  If  ^R  corresponds  to  an 

incremental  rotation  of  6$  about  an  axis  r],  then  we  can  use  the  small  angle  approximation 
to  Rodrigues’  formula  to  obtain 

F'i  ^  e'i  +  s0{vxe'i) 

=  i'i  +  £'i  X  m  (8.56) 

where  m  =  -69  rj.  We  can  thus  write 

K  —  ^'i  —  (8.57) 

where 

A'o  =  b'  •  {£'i  X  Ti) 
a^-Q  =:  (b  X  Fj)  X  £  i 

=  b' •  ((£'i  X  m)  X  r,) 

=  (b'  X  Fi)  X  {£'i  X  m) 

Noting  that 

=  -((b'  X  Ti)  X  £'i)  ■  m  =  -a'o^m  (8.61) 

the  vector  h'  thus  becomes 

N 

h'  = 

2  =  1 
N 

=  ("^*0  -  (a'o  +  ^a') 

2  =  1 

N  N  N 

=  ^  (a'q 

2  =  1  2=1  2  =  1 


(8.58) 

(8.59) 

(8.60) 


(8.62) 
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Let  S*  =  ji'  denote  the  value  of  S  given  b'  and  R',  and  let  Cq  denote  the  matrix  C 
computed  using  Rq.  From  equation  (7.14)  we  then  have 


N 


Yj  -  2A'oa'o^m  +  m^a'oa'o^m 


4  =  1 

,/T 


b'^Cob'  -  2ho^m+  m^Aom 


(8.63) 


where 

N  N 

ho  =  ^  A(oa(o,  and,  Aq  =  Y.  (8.64) 

4=1  4  =  1 

If  S*  is  indeed  a  local  minimum,  then  from  equation  (7.17),  it  must  also  be  the  case 


that 

ho^  =  Aom 

(8.65) 

giving 

S*  =  b'^Cob'  -  m^Aom 

(8.66) 

as  well  as 

=  Y  Sa'i 

4  =  1 

(8.67) 

Defining 

N  N 

^h'  =  Y  and,  ^A'  =  Y  ^a'a(o^ 

(8.68) 

4=1  4  =  1 


we  see  that  the  necessary  condition,  h'  =  0,  for  S  to  have  a  stationary  point  can  thus  be 
stated  as 

m  =  (^A')“^^h'  (8.69) 

We  can  now  show  at  least  one  case,  which  occurs  when  bo  -L  hs,  where  we  know  an 
alternate  solution  exists.  Since 

r,  =  +  ^bo  (8.70) 

a'o  =  |^(b'  X  £'i)  X  £',  -  X  £', 

Zp-  Z/p- 


we  can  write  a'g  as 


(8.71) 
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where  u  =  bo  X  b'.  Now  let  b'  =  V3  and  m  =  a  u,  where  a  is  some  constant,  so  that 


N 


s*  =  h^CoVs  -  0'^"^ 


Ki=:-1 


Zr. 


{{h  X  £'i)  X  £'i)({v3  X  £'i)  X  £'i) 


u 


(8.72) 


The  summation  in  the  second  term  of  this  equation  is  identical  in  form  to  the  one  in 
equation  (8.32)  whose  analytic  expression,  derived  by  approximating  the  sum  as  an  integral, 
is  given  on  the  right-hand  side  of  equation  (8.39).  Upon  substituting  V3  for  b  and  using  the 
facts  that  w'  =  (ha  x  b')  x  £>3  =  0  and  £3  •  bo  =  0,  we  obtain 


uT  (f")'  ((b'  X  £',)  X  £',)(( b'  X  £'i)  X  U 

Since  V3  is  an  eigenvector  of  Cq,  when  £'3  J.  bo,  with  eigenvalue 

kND^ 


(8.73) 


P 


V3 


(8.74) 


we  thus  have 


(8.75) 


If  a'2  =  K/q,  S*  will  be  identically  zero,  and  thus  will  clearly  be  a  minimum.  However, 
we  should  keep  in  mind  that  the  constants  k  and  7  are  only  approximations  to  values  that 
would  be  obtained  if  the  Z  coordinates  of  the  points  in  the  scene  were  known  exactly,  and 
hence  we  cannot  use  equation  (8.75)  to  find  a  by  setting  S*  =  0.  To  show  that  the  solution 
b'  =  £3,  m  =  a  (bo  X  £’3)  minimizes  S,  we  need  to  show  that  it  satisfies  the  necessary 
conditions  (8.53)  and  (8.54). 

We  first  check  that  h'  =  0  by  expanding  each  of  the  terms  in  equation  (8.67),  applying 
the  rigid  body  motion  equation  (8.70)  and  the  conditions  bo  ■  £3  =  0  and  m  =  a  bo  X  V3,  to 
give 

A'o  =  £3  •  (£'i  X  Fi)  =  ^£',  ■  (bo  X  £3)  (8.76) 

a'o^m  =  Q'^((£3  X  £'i)  x  £\)  ■  (bo  x  £3) 

^Ti 


—  a 


^(fi3-£'i)^'*-(box£3) 


(8.77) 
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Sa'-  =  a(v3  X  ri)  x  {t\  x  (bo  x  v^)) 

7j  1 
=  "^(^3  X  Ui)  X  {I'i  X  (bo  X  V3))  +  a—-(v3  x  bo)  x  (£'*  x  (bo  x  V3)) 

Zr- 

=  -Q'  •  ho)£’i  +  •  (bo  X  D3))(bo  X  V3)  (8.78) 

We  can  thus  write 

N  y 

h'  =  -a  E  ^  (1  -  •  {-'i))  ((bo  X  V3)  ■ 

N  , 

-  a-  X]  ^  (1  -  otZcXh  ■  t'i))  (bo  X  V3fUie]{ho  x  i;3)(bo  X  V3)  (8.79) 

•  1 

t=l 

and  then  replace  the  sums  by  integrals,  assuming  the  rays  are  uniformly  distributed,  to 
obtain  an  analytic  expression.  Before  writing  the  solution,  however,  we  first  use  a  result 
shown  at  the  ends  of  Sections  B.5  and  B.6,  that  substituting  (ha  •  I'i)  =  I,  in  the  above 
equation  does  not  affect  the  value  of  either  integral.  We  also  note  from  equation  (B.28) 
that  the  integral 

rD  f2'!r 

/  /  (( bo  X  *3)  •  e')£'U^ho  (d^da—^O  (8.80) 

Jo  Jo 

We  thus  have 


-an  (l  -  ttZr)  (bo  x  t;3)'’^^'£''^(bo  x  fi3)(bo  x  V3)  (  d^  da 

(l  -  aZc^  (bo  X  V3)  (8.81) 


where  Ze  =  E[Z£]. 

The  condition  h  =  0  can  be  thus  be  satisfied  by  setting  a  =  l/Zi.  Again,  this  is  a 
convenient  approximation,  however,  it  does  not  change  the  fact  that  we  can  find  a  constant 
a  which  satisfies  h  =  0  by  setting 


Eill  l/^r^.-(bo  X  V3fU,£'J{ho  X  h) 

Ell  ZeJZliv3■i'^)iho  x  hre'.e'Jiho  x  h) 


(8.82) 


The  second  necessary  condition  for  a  minimum  is  that  b'  =  V3  must  be  the  eigenvector 
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of  C'  corresponding  to  its  smallest  eigenvalue.  We  write  C'  as 


c'  =  E'!': 


=  EVio  + 

i=l 

N  N 

=  Co  +  ^  SciScP 


(8.83) 


where 


c^o  —  ^  I  X  r^' 

Sci  =  a{£'i  X  (bo  X  Us))  x 


(8.84) 


Using  the  rigid  body  motion  equation  (8.70),  we  expand  Cjo  as 


-  y  B  X  0  * 


(8.85) 


where  Bxq  is  the  cross-product  matrix  corresponding  to  the  operation  boX,  and  in  a  similar 
manner  write  Sc;  as 


6ci  =  — 


Zq  (mf  I  -  e'ii'J)  +  {ho-e'i)  l]  (bo  X  us) 


(8.86) 


We  thus  have 


-■^Bxo  +  eU£'Jho)  (bo  X  vsf  -  ■  (bo  x  U3))£'./ 


(8.87) 


SciSc,'^  =  [{zl\e'if  +  2Zr.|£',f£'fbo  +  bj£',£f bo)  (bo  x  U3)(bo  x  us)^ 

r  i 

-  Zl\£'i\'^  (£'i£'J(hQ  X  U3)(bo  x  +  (bo  x  U3)(bo  X  U3)'^£',;£'7) 

-  Ze,(£'i  ■  (bo  X  Us))  (£',£'7bo(bo  x  us)'^  +  (bo  x  U3)bj£',£'^) 

+  Zl{£\- {ho  xvs)f£'i£'J]  (8.88) 
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The  above  equations  are  written  so  that  each  term  can  be  identified  with  one  of  the  spe¬ 
cial  integrals  solved  in  Appendix  B.  Upon  approximating  the  summations  in  equation  (8.83) 
as  integrals  and  using  the  fact  that  bo  •  vs  =  0,  the  solutions  are  found  to  be 

N  ’  /  j^2  \  jj2 

+  ScicJo^  — >  -2anZfN  I  l-f  -^  |  (bo  X  •i)3)(bo  X 

(8.89) 


^SciSci 


a^N  9  (  1  1  +  (^0  ^  ^’3)(bo  X  vs)^ 

o  D'^  o  D'^  f  D^\  ,  ,  T 

+  +  0''yN—  (  1 - ^  1  v^vs'- 


(8.90) 


It  is  now  clear  that  C'  has  the  same  eigenvectors  as  Co,  namely  bo,  vs,  and  bo  x  t>3. 
From  equations  (8.21),  (8.20),  (8.83),  (8.89),  and  (8.89),  the  eigenvalues  are  given  by 


M'b  =  0 


(8.91) 


,  ND^  (  2  ( 


(8.92) 


_  /  n2\  r)2  f  jn2  n4 

^'bxh3  =  '^^  l-2QZr  M  +  —  J  -l-a2_  q_  o,2^jV  h  +  _  +  _ 


(8.93) 


The  only  difference  between  equations  (8.92)  and  (8.75)  is  the  factor  {2K,Z(Ja  —  7) 
multiplying  0^7,  which  arises  from  the  different  manner  in  which  intermediate  terms  were 
grouped  in  deriving  these  equations.  Recalling  that  k,  7,  and  are  only  convenient  symbols 
representing  unknown  values,  differences  in  terms  involving  products  of  these  constants 
should  not  necessarily  be  considered  significant.  We  note  that  if  the  distances  in  the  image 
are  constant  and  equal,  i.e.,  Z^^  =  Z,.;  =  Z  for  i=  1, . . . ,  A,  we  can  set  a  =  1/Z  giving 


2«;Z 

a 


-7  =  1 


(8.94) 
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and  hence 

yih  =  0  (8.95) 

We  nonetheless  expect  ^  0  for  many  distributions  of  depths  in  the  scene  when  bo  i-  V3. 
Given  the  fact  that  the  two  smallest  eigenvalues  of  C'  are  both  very  close  and  both  much 
less  than  /r',  -  ,  we  can  see  that  with  the  inevitable  errors  in  the  data,  it  can  easily  happen 

D  X  V3 

that 

From  this  derivation,  we  now  see  the  basis  for  the  test  presented  in  the  last  chapter 
to  determine  if  the  algorithm  has  fallen  into  an  alternate  minimum.  If  it  is  true  that 
bo  II  V3,  then,  as  we  saw  previously  in  equation  (8.25),  the  ratio  of  the  middle  to  the  largest 
eigenvalues  is 

^  w  1  (8.96) 

AG 

If  on  the  other  hand  bo  -L  f’s  and  b'  =  1)3,  then 

(8.97) 

A^s 

The  ratio  test  is  simple  to  compute  and,  based  on  numerous  experiments,  has  proven 
to  be  an  extremely  reliable  indicator  of  a  false  solution.  It  has  consistently  outperformed 
other  measures  which  can  be  readily  obtained  from  the  data,  such  as  S  or  Kc,  the  condition 
number  of  C. 


8.2  Error  Analysis 

Having  analyzed  the  numerical  stability  of  the  procedures  for  estimating  translation  and 
rotation  as  a  function  of  both  the  size  of  the  viewing  held  and  the  type  of  motion,  we  can 
now  quantify  more  precisely  the  robustness  of  the  estimates  as  a  function  of  the  error  in 
the  data. 

Errors  in  the  data  may  arise  from  both  random  and  systematic  sources.  Systematic 
error  can  usually  be  attributed  to  poor  calibration  of  the  imaging  system  while  random 
errors  result  from  the  hnite  resolution  of  the  image  sensor  and  from  approximations  made 
in  the  matching  procedure.  Since  the  sensor  is  discretized,  each  pixel  subtends  a  hnite  solid 
angle,  which  is  approximated  by  a  single  vector  from  the  center  of  projection  to  the  center 
of  the  pixel.  The  block-matching  procedure  compounds  the  error  due  to  the  hnite  pixel  size 
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y 


Figure  8-2:  Change  in  caused  by  error  in  determining  the  exact  location  of  the  corre¬ 
spondence  with  the  left  image. 


by  assigning  the  correspondence  to  the  centers  of  the  best  matching  blocks. 

The  combined  result  of  both  random  and  systematic  errors  is  a  displacement  in  the 
image  plane  of  the  estimated  correpondence  point  location  from  its  true  position.  We  will 
assume  that  the  position  vectors  of  the  feature  points  in  the  left  image  are  known  exactly 
and  model  the  error  in  the  corresponding  r,  by  a  vector  Svi  which  is  added  to  the  correct 
vector  Tio, 

Ti  =  Fio  -I-  Svi  (8.98) 

as  shown  in  Figure  8-2.  Since  the  error,  whether  systematic  or  random,  is  assumed  to  be 
uniform  over  the  image,  we  again  set  the  weights  Wi  =  1  in  all  of  the  following  derivations. 

We  will  consider  the  cases  of  random  and  systematic  errors  separately.  For  random 
errors  we  assume  that  the  vectors  are  independent  and  identically  distributed,  since 
each  block  is  matched  independently  of  the  others,  and  since  neither  the  pixel  size  nor  the 
block  size  used  in  the  matching  procedure  is  a  function  of  position  in  the  image  plane.  We 
write  Sri  as 

Pi  cos  Pi  ^ 

Pi  sin  Pi  (8.99) 

0 

where  Pi  is  uniformly  distributed  over  [  0, 27r)  and  pi  has  some  probability  distribution  over 
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[O^Rmax]  with  E[pj]  =  <T^.  Wc  thus  ha,ve 

E[6r,]  =  0  (8.100) 

and 

E[6rJSr,]=l'’‘’  (8.101) 

[0,  j 

while  the  covariance  matrix,  A,  is  given  by 

A  =  f;[«5ri^rj]  =  i  (8.102) 

I  0,  j 

Systematic  errors  are  modeled  by  a  constant  vector  added  to  all  of  the  vectors  r,o. 
In  this  case  we  have 

£’[^r]  =  pSr,  and,  =  p^  (8.103) 

Since  the  error  is  constant,  of  course,  its  variance  is  zero. 

We  now  examine  the  first-order  effects  of  including  these  error  terms  on  the  estimates 
of  the  baseline  and  the  rotation. 

8.2.1  First-order  error  in  the  baseline  (rotation  known) 

We  consider  first  the  case  where  the  rotation  is  known  exactly.  From  our  definition  of 
the  error  in  equation  (8.99),  we  can  write  the  vectors  c,,  defined  in  (7.5),  as 

Cj  —  £■  ^  "K  Fi 

=  £'i  X  rio  +  E'i  X 

=  Cio  +  ^Ci  (8.104) 

We  also  have 

A,-  =  Ci  •  b  =  6ci  •  b  (8.105) 

since  Cio  •  b  =  0  by  the  definition  of  c^o  as  the  error-free  vector  x  rio).  We  thus  write  C 


as 


CHAPTER  8.  THE  EFFECTS  OF  MEASUREMENT  ERRORS 


123 


iV  JV 

^  CiocJo  +  (cioScJ  +  +  SciScJ ^ 


=  Co  +  AC 


(8.106) 


Dropping  the  last  term,  which  is  second-order  in  the  error,  we  then  write 


J\ 

AC  ^  (cio^cf  -h  ScicJoj 


(8.107) 


Let  bj  denote  the  eigenvector  corresponding  to  the  smallest  eigenvalue  of  C,  and  let  bi, 
b2,  and  bs  represent  the  eigenvectors  of  Co  with  eigenvalues  /i2,  and  /is,  where 


M3  >  M2  >  Ml  =  0 


(8.108) 


|bil  =  |b2|  =  |b3|  =  l 


(8.109) 


Using  a  result  from  matrix  perturbation  theory  [91],  we  can  express  hs  by  a  first-order 
Taylor  series  expansion  in  terms  of  AC  and  the  unperturbed  eigenvectors  and  eigenvalues 


of  Co  as 


b?ACbi 


bjACbi 


—  bj  -f  ^bj 


(8.110) 

(8.111) 


The  error  vector  ^bi  is  perpendicular  to  bj,  and  its  magnitude,  given  by 


|^ba|  = 


bjACbi 


bg^ACbi 


(8.112) 


approximates  the  angle,  6h,  between  bi  and  b^. 

The  important  quantity  in  determining  the  magnitude  of  the  error  is  clearly  the  vector 
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ACbi.  From  (8.106)  we  ha.ve 

N 

ACbi  =  Y,  (cio^cf  +  Scicjg^  bi 
2  =  1 
N 

=  (8-113) 

Expressions  for  the  estimation  error  in  the  cases  of  random  and  systematic  measurement 
errors  will  now  be  derived  separately. 


Uncorrelated  random  error 

When  the  error  is  uncorrelated,  we  have  from  equations  (8.105)  and  (8.100) 

N 

E[ACbi]  =  ^E[Ai]c,o 

i=l 

N 

=  X^(bi  X  £\fE  [^r,]  c,o  =  0  (8.114) 

2=1 

giving  also, 

E[^bi]  =  0  (8.115) 

The  fact  that  the  expected  value  of  ^bi  is  zero  only  means  that  it  has  no  preferred 
direction.  The  appropriate  measure  of  the  error  is  the  magnitude  of  ^bi  which  is  also,  to 
first  order,  the  angle  Ob  between  bi  and  bj.  We  thus  compute 

^  =  E  \0l]  =  ^b'^E  [ACbib^Ac]  b2  +  \hjE  [ACbib^Ac]  bg  (8.116) 

J  /,C2  ^  0-3  ^ 

From  equation  (8.113),  we  can  write  the  term  ACbib^AC  as 

N  N 

ACbib^AC  =  YY.  (8.117) 

i=l  j=l 

and  hence,  by  independence  of  the  errors, 

N  N 

E[ACbibTAc]  =  ^  ^  E  [A,-Aj]  c,ocJo 

i=l  j=l 
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=  5]i?[A?]c,o4  (8.118) 

i=l 

We  could  approximate  this  sum  as  an  integral,  following  the  approach  taken  in  Sec¬ 
tion  8.1.1,  and  derive  an  exact  expression  for  E  ACbib^AC  .  However,  we  can  gain  more 
insight  into  the  problem  by  making  the  approximation  that  the  term  ElXf]  can  be  replaced 
by  its  average  value  A^,  where 

i=l 

Making  the  substitution  in  equation  (8.118),  we  have 

N  _ 

^[ACbibjAc]  =  ^A2c,oc7o 

=  WCo  (8.120) 

and  hence  the  error  Of  becomes 

W  =  ^bTCob2  +  ^bJCob3 
1*2  Ms 

=  +  — )  (8.121) 

\R2  UsJ 

We  now  need  only  to  find  an  expression  for  A^.  From  equations  (8.102)  and  (8.105),  the 
expected  value  of  A?  is 

E  [Af]  =  E  [(bi  X  £'0^^ri^rj(bi  X  E ,) 


2 

=  y(bl  X£'i)T  (l-iiTj(bl  X^',) 

=  y  [|^',p-(^Vbi)2-((biX5)-r,)'' 


(8.122) 
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Substituting  this  equation  into  (8.119)  and  approximating  the  sum  by  an  integral  we  have 

_  ^2  /‘D  f'2'K  p  1 

Then,  using  the  results  of  Appendix  B  and  simplifying,  we  obtain 

|(bi  X  hs)  X  ^  (l  +  (bi  •  v^f  -  |(bi  X  2)  X  hsp)  (8.124) 

Combining  this  expression  with  the  formrdas  for  the  eigenvalues  of  Co  given  in  equa¬ 
tions  (8.20)  and  (8.21),  we  can  write  6^  in  its  most  general  form  as 

H  =  ‘^  |(bi  X  Us)  X  ^  (l  +  (bi  •ha)^  -  |(bi  X  i)  X  hsp)  • 

_ 1 _ 

T)2  ^  T>2(bi  •  ha)^  +  4|bi  x  haP 

We  observe  that  Of  is  proportional  to  the  error  variance,  a^,  and  to  the  squared  distances 
of  objects  in  the  scene.  (Recall  that  k  =  £'[1/Z^].)  We  can  also  see  that  — >  0  as 

N  ^  oo  &s  should  be  expected  for  an  unbiased  estimator.  The  behavior  of  9'^  as  a  function 
of  D,  however,  depends  on  the  orientation  of  bi  with  respect  to  ha.  When  bi  T  ha, 
equation  (8.125)  reduces  to 

n  =  I(bix63)xi|n^|ixi3!^  (4  +  oq  (8.126) 

As  H  ^  0  the  term  in  l/H^  dominates  giving 

H  —  l(bi  X  *3)  X  ip  (8.127) 

SO  that  ^  oo  as  the  field  of  view  decreases.  As  D  increases,  however, 

^  ^  l(bi  x  ha)  X  hp  +  +  x)  1-^'  "" 

Since  ha  is  usually  very  close  to  z  we  can  neglect  the  term  \z  x  hap  for  all  reasonable 
values  of  D.  We  thus  expect  the  error  to  become  constant  for  large  fields  of  view. 


(8.125) 
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When  bi  =  V3,  any  vector  perpendicular  to  V3  is  an  eigenvector  of  Co  with  eigenvalue 


H  =  kND'^/4:.  The  expected  squared  error  thus  becomes 


— 
’>  kN 


(2  -  |f  X  hf) 


and  is  now  completely  independent  of  the  field  of  view. 


Systematic  error 

When  the  error  vector  ^r  is  constant,  equation  (8.113)  becomes 


ACbi  =  ^  AjCio 


=  |^^Cio(bi  X  ^r 


Using  the  rigid  body  motion  equation  (8.9),  we  can  write 


(8.129) 


(8.130) 


b  X  £'i  =  -Zn  {£'i  X  rio) 


(8.131) 


and  hence 


ACbi  =  -  ^r 


(8.132) 


We  now  approximate  that  the  depths  can  be  replaced  by  their  average  value  Z^  so  that 


ACbi  =  --^r 


=  —Zr  Co^r 


(8.133) 


Substituting  this  expression  into  equation  (8.110),  we  find  that  the  estimated  value  of 
the  baseline  is  given  by 


1  1  T^/boCo^rl  —  (bgCo^rl 

bg  ~  bl  +  Zy  (  - — -  1  b2  +  Zy  ( — - 1  bg 


bi  +  Zy  ((b2  ■  ^r)b2  +  (bs  •  ^r)b3 


(8.134) 
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which  can  also  be  written  as 


bfi  =  bi  +  Zr  (^r  -  (bi  •  ^r)bi)  (8.135) 

This  result  makes  perfect  sense,  of  course,  since  it  states  that  the  adjustment  to  the 
baseline  is  along  the  component  of  Sr  which  is  perpendicular  to  bi.  The  angular  error, 
is  given  by 


Ob  -  |^b| 

=  Zr  |bi  X  (bi  X  ^r)| 

=  Z;|bi  X  ^r|  (8.136) 

With  systematic  measurement  errors,  the  estimation  error  is  still  a  function  of  the 
distances  to  objects  in  the  scene,  but  no  longer  depends  on  either  the  field  of  view  or  the 
number  of  correspondence  points  iV. 

8.2.2  First-order  error  in  the  rotation  (baseline  known) 

We  now  look  at  the  case  where  the  translation  direction  is  known  and  examine  the  error 
in  the  rotation  estimate.  To  first  order,  this  error  can  be  associated  with  the  incremental 
adjustment  computed  by  the  Update_q()  procedure  when  the  correct  b  and  q  are  input. 
Let  denote  the  updated  rotation  quaternion  and  q  its  true  value.  Using  the  notation  of 
Section  7.1.2 

q5  =  ^qq  (8-137) 

with 

^q  =  ^cos  y  ,  r]  sin  y^  (8.138) 

The  error  quaternion  ^q  corresponds  to  an  additional  rotation  of  SO  about  an  axis  fj 
applied  to  the  true  rotation  q.  From  equations  (7.12),  (7.15),  and  (7.17),  SO  and  fj  are 
computed  from  the  vector  m,  given  by 


m  =  —SO  fj  =  A 


(8.139) 
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Because  Aq  and  Ag  ^  are  symmetric,  they  can  be  diagonalized  as 

Ao  =  SVS^,  and  Ag^  =  SV-^S'^  (8.147) 

where 

S  =  (wiW2W3),  V  =  diag(/ii,//2,A^3) ,  and  S^S  =  I  (8.148) 

so  that 

h'^Ag  ^  Ag  =  -4-  (wi  •  h)^  +  4“  ("^2  •  h)^  +  (ws  •  h)^  (8.149) 

T2  Ms 

The  expression  for  the  estimation  error  will  depend  on  whether  the  errors  in  the  data 
are  systematic  or  random.  These  cases  are  now  examined  separately. 

Uncorrelated  random  error 

With  random  measurement  errors,  the  expected  value  of  m  is  zero  since 

N 

E[m]  =  Ag  i^[h]  =  Ag  1  ^ 

i=l 

which  implies,  as  we  should  expect,  that  m  has  no  preferred  direction.  The  variance  of  the 
error  is  thus  ElSd"^],  which  can  be  written  as 

66'^  =  E[60^]  =  4- w^-E^[hh'^]wi  +  4 ’^2  +  4  ""^3  (8.151) 

Ml  Ms  Ms 

The  important  quantity  to  compute  is  clearly  £'[hh^].  Using  (8.143),  we  have 

[  N  N 

£[hhT]  =  E  Y.T. 

J=i j=i 

=  f^E[Af]a,oaZ  (8.152) 

i=l 

We  again  make  the  approximation  that  E[Xj\  can  be  replaced  by  its  average  value  so 
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E  hh'^  =  A2^a/oa^ 


—  A^Ao 


(8.153) 


giving 


_  ^2  \2  \2 

S6^  =  — T  wJAqWi  +  ^  wJAoV1^2  + -?  wJAqWs 

Ml  M2  M3 


=  A2  (-  +  -  +  - 
VMI  M2  M3 


(8.154) 


which  is  compieteiy  anaiogous  to  equation  (8.121)  for  the  error  in  the  baseline  estimate. 

We  can  find  an  expression  for  S0'^  in  the  special  cases  b  J.  hs  and  b  =  hs  for  which  we 
previously  derived  the  eigenvalues  of  Aq  in  Section  8.1.2.  From  equations  (8.43)-(8.45)  we 
have  in  the  case  b  1  ha 


Ml  = 


(8.155) 


(8.156) 


m  1  +  t  +  T 


(8.157) 


while  A^,  given  in  (8.124),  simplifies  to 


r.  ,.2  „  2 

y  |(b  X  Vg)  X  +  — |n3  X  2:|2 


(8.158) 


The  variance  of  the  error  is  therefore 


0^2  r)2 

—  I(b  X  hg)  X  hp  +  — |h3  X 


2cr2  /  6  1  8 

^  8  +  4T>2  +  J94 


(8.159) 
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after  dropping  the  term  |h3  X  z\  which  is  usually  negligible. 

We  see  that  66'^  is  proportional  to  the  variance  of  the  error  in  the  data,  cr^,  and  inversely 
proportional  to  the  number  of  correspondences,  N .  As  D  0,  the  error  in  the  rotation 
estimate  increases  rapidly  (^(9^  ~  l/Zl'*),  while  for  large  values  of  D,  it  eventually  goes  to 
zero. 

When  b  =  {’3  we  have 


Ail  =  /i2 


7iVZ»2 


and  /i3 


(8.160) 


and 


A2  = 


8 


(2-  |f  X  hap) 


The  expression  for  SO^  thus  becomes 


se^  = 


(8.161) 


(8.162) 


Again,  60'^  is  proportional  to  c^/iV,  however,  its  behavior  as  a  function  of  D  is  much  less 
severe  than  in  the  case  of  b  J.  {>3.  As  Z)  — 0,  increases  only  as  1/ ,  and  D  —>■  00, 
it  approaches  a  constant  value. 


Systematic  error 

In  the  case  of  systematic  measurement  errors,  we  must  derive  an  exact  expression  for 
h  in  order  to  obtain  a  formula  for  From  equations  (8.105),  (8.142),  and  (8.143),  we 

have 

N 

h  =  ^  (  Aj  Hjo 

i=l 

N 

=  ^((b  X  rio)  X  £')£'  ■  {6r  x  b)  (8.163) 

2  =  1 


Using  (8.31),  we  can  also  write 


CHAPTER  8.  THE  EFFECTS  OF  MEASUREMENT  ERRORS 


133 


Approximating  by  its  a.vera.ge  value  and  assuming  that  the  correspondences  are 

distributed  uniformly  over  the  field  of  view,  we  replace  the  summation  by  an  integral  to 
obtain 


(8.165) 


The  solution  to  this  integral  can  be  obtained  by  combining  equations  (B.12)  and  {B.28)  of 
Appendix  B.  Skipping  the  algebra,  the  result  is: 


h=  (^]N\{h-vs)  (^1+ 


t)2  \ 

1  +  —  bhs'^  (^r  X  b)  (8.166) 


We  note  also  that  we  can  express  the  average  value  of  Xi  as 


=  {6r  X  b)  •  Vs 


(8.167; 


Using  this  expression  in  equation  (8.166)  then  gives 

h  =  A  j  (b  •  ha)  xh)  +  x(^l-  ^  j  ha  j  _  A  (^1  +  ^  j  b  (8.168) 

Again,  the  behavior  of  the  error  as  a  function  of  D  depends  on  the  orientation  of  b  and 
ha.  When  b  T  ha,  h  becomes 


Because  b  is  an  eigenvector  of  A  with  eigenvalue 

/■b  =  7JV  1  +  — +  -^ 


(8.169) 


(8.170) 
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we  find  from  equation  (8.149)  that  the  error  becomes 


\se\  = 


2\\\fzA 


7  \Zr  8  + 4D-^  + 


(8.171) 


The  error  does  not  depend  on  N ,  but  does  depend  on  the  field  of  view.  As  H  ^  0,  the 
error  becomes  infinite  as  1/D^  while  as  D  ^  oo,  \66\  0. 

When  b  II  1)3,  h  becomes 


4  \  Z: 


(^r  X  b) 


(8.172) 


In  this  case,  any  vector  perpendicular  to  b  is  an  eigenvector  of  A  with  eigenvalue 
=  'fN We  thus  have 

m  =  if|^Vrxb)  (8.173) 


l^^l  =  |m|  =  -  ( )|.5r  X  b| 
l\^r 


(8.174) 


The  error  is  now  independent  of  both  D  and  N . 


8.2.3  Coupling  between  estimation  errors 

In  the  last  two  sections  we  have  analyzed  the  estimation  errors  in  the  cases  when  either 
the  baseline  or  the  rotation  is  known.  We  now  examine  the  situation  when  neither  is  known. 
Let  S*  denote  the  minimum  value  of  5  =  let  So  denote  the  value  when  the 

correct  b  and  q  corresponding  to  the  actual  motion  are  used.  We  can  expand  5  in  a 
first-order  Taylor  series  to  approximate  S*  as 


Assuming  5*  corresponds  to  a.  true  local  minimum  and  is  therefore  unique,  this  expres¬ 
sion  defines  a  relation  between  ^b  and  m  which  must  be  (approximately)  satisfied  to  achieve 
optimality.  The  values  of  ^b  and  m  used  to  obtain  S*  from  So  cannot  therefore  be  the  same 
as  those  which  minimize  the  error  in  the  case  of  known  rotation  or  known  translation  since, 
by  definition,  each  of  these  assumes  the  other  to  be  zero. 

In  order  to  determine  a  constraint  between  ^b  and  m,  we  need  to  use  the  necessary 
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conditions  for  S*  to  be  a  (local)  minimum,  which  are 

Cb  =  /ib  (8.176) 

where  fi  is  the  smallest  eigenvalue  of  C,  and 

N 

h  =  ^A,a,  =  0  (8.177) 

i=:l 

Since  there  is  always  at  least  one  solution  to  the  eigenvector  equation  (8.176)  for  every 
rotation,  it  does  not  provide  any  new  information.  We  thus  need  to  find  solutions  to  (8.176) 
that  are  also  consistent  with  (8.177). 

Defining  bo  as  the  true  translation  vector,  we  have  at  5  =  5* 

Xi  =  h-{£'iXri) 

=  (bo  +  <5b)  •  ((^','0  +  Si’i)  X  (r^o  +  ^r^)) 

«  ^b  •  (£'io  X  Fio)  +  bo  •  X  r^o)  +  bo  •  (^'io  X  (5rj)  (8.178) 

dropping  terms  that  are  second-order  and  higher  in  the  error.  Since  Aj  contains  all  of  the 
first-order  error,  we  also  consider  a,  «  a^o- 
From  equations  (7.10)  and  (7.12)  we  obtain 


6i'i  =  i’,0  X  mo  (8.179) 

where  mo  represents  the  incremental  rotation  vector  applied  to  the  true  rotation  to  get  to 
S  =  S*,  and  can  thus  write 


bo  ■  {S£'i  X  Fio)  =  bo  •  i(£'io  X  mo)  x  f^o)  =  -a^mo  (8.180) 


Using  also  the  fact  that 


bo  X  £  i  —  (£  i  X  F-^'o)  — 


(8.181) 
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equation  (8.178)  becomes 

Aj  =  ‘  C^'o  ^ri^iO  ' 

=  cl^{Sh  -  Zr.Sr^)  -  aff^mo  (8.182) 

Combining  these  expressions,  we  find  that  the  necessary  condition  for  optimality  is  thus 

N 

((5b  -  ZrAvi  -  a|,mo)  =  0  (8.183) 

2  =  1 

which  can  also  be  written  as 

N  N 

^  aiocJo  (^b  -  Zr,Sri)  =  ^  aioa^mo  =  Aomo  =  ho  (8.184) 

?:=i  (=1 

where  ho  is  the  value  of  h  at  5  =  5o. 

If  the  measurement  errors  are  random  and  uncorrelated,  we  have  from  equations  (8.100), 
(8.115)  and  (8.150), 

E[Sr,]  =  i;[^b]  =  i;[ho]  =  0  (8.185) 

and  hence  equation  (8.184)  does  not  provide  any  new  information.  With  random  measure¬ 
ment  errors,  the  expected  values  of  ^b  and  m  are  zero  whether  the  rotation  and  translation 
are  estimated  separately  or  together.  Consequently,  equations  (8.121)  and  (8.154)  for  the 
variances  of  the  estimates,  0"^  and  S9'^,  are  also  still  valid. 

With  systematic  measurement  errors,  however,  the  situation  is  different.  We  can  write 

N  ^7 

=  Y.-^i{hox£Uo)x£'M)ihox£'iof 

i=l  i=l 

N  y 

=  L  ^  ((bo  •  £'io)£\o£'l  -  \£'^o\^h^£'l)  Bxo  (8.186) 
2=1 

where  Bxq  is  the  cross-product  matrix  corresponding  to  the  operation  boX,  and  combine 
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equations  (8.184)  and  (8.186)  to  give 

E  ((bo  •  -  \£'^ofho£'l)  (^—{ho  X  ^b)  +  ((ir  X  bo))  =  ho  (8.187) 

Comparing  this  equation  with  the  definition  of  ho  given  in  equation  (8.164),  we  see  that  a 
solution  exists  only  if  ^b  =  0,  which  implies  that  the  estimation  error  is  entirely  absorbed 
by  the  rotation. 

This  is  a  significant  practical  result  as  it  implies  that  one  can  obtain  very  accurate 
estimates  of  the  translation,  even  with  poorly  calibrated  systems,  when  the  full  algorithm 
is  used.  We  have  to  be  careful,  however,  before  concluding  that  ^b  is  always  zero  when  the 
rotation  and  translation  are  estimated  jointly,  because  h  is  only  a  linear  approximation  to 
the  derivative  of  5  with  respect  to  the  rotation.  We  can  write  Sq  as 

N 

So  =  ^((^rxbo)-£',o)' 

i=l 

N 

=  ^(^r  X  bo)^^'io^'^(^r  X  bo) 

t  =  l 

— '  /  /  (^r  X  bo)Vio^'^(^r  X  bo)^d^dQ' 

Jo  Jo 

=  NX^ +  ^^{\Svxhof (8.188) 

using  the  solution  for  the  integral  given  in  equation  (B.19)  and  the  definition  A  =  (^rxbo)-U3 
from  equation  (8.167). 

When  bo  ||  Ua,  A  =  0,  so  that 

N 

So  =  |.5r  X  bop  (8.189) 
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8.3  Experimental  Verification 

In  order  to  test  the  results  derived  in  this  chapter,  it  was  necessary  to  generate  a  random 
data  set  of  3-D  coordinates  corresponding  to  world  points  in  the  scene.  This  was  done  by 
first  generating  the  Z  values  according  to  a  specified  probability  distribution  and  then 
selecting  the  A'  and  Y  values  so  that  the  position  vectors  would  be  uniformly  distributed 
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Probability  Density  Function  of  Z 


Figure  8-3:  Probability  density  function  used  to  obtain  random  Z  values. 


over  the  specified  field  of  view. 

The  probability  density  function  used  to  assign  Z  values  for  the  tests  is  shown  in  Fig¬ 
ure  8-3  and  was  determined  based  on  a  rough  estimate  of  the  distribution  of  depths  that 
are  typically  encountered  in  practice.  The  mean  value  of  Z  was  set  at  19  basehne  units, 
which  means  that  if  the  camera  moves  20  cm  between  frames,  the  average  distance  to  any 
object  in  the  scene  would  be  3.8  m.  Once  the  value  of  Z  is  selected,  ^  and  a  are  chosen 
uniformly  over  the  intervals  [0,-D]  and  [0, 27r),  respectively,  so  that  A"  and  Y  are  computed 
as 

X  =  Z^cosa,  and,  Y  —  Z^sina  (8.195) 

Given  the  set  of  3-D  points,  p/,  the  set  of  correspondences,  p,.,  is  then  generated  by 
applying  the  rigid  body  motion  equation 


Pr  =  Rp;  +  b 


(8.196) 


to  each  point  in  p;,  for  some  value  of  R  and  b.  We  thus  obtain  an  error-free  list  of  N  pairs 
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{(ri,£,)},  where 


—  ry  Pri7  £(■  P/j 


Measurement  errors  are  then  simulated  by  determining  a  vector  6vi, 


Pi  cos  j3i 
Pt  sin  f3i 


(8.197) 


(8.198) 


as  previously  defined  in  equation  (8.99). 

The  magnitude  of  the  error,  p,-,  is  specified  in  units  of  the  focal  length,  /,  so  that  it  is 
independent  of  both  the  field  of  view  and  the  spatial  discretization  of  the  sensor.  To  convert 
from  units  of  /  to  pixels,  one  can  multiply  pi  by  '2D /n,  where  n  is  the  number  of  pixels 
along  one  dimension  of  the  sensor.  For  example,  with  a  20°  field  of  view  {D  —  .364)  on  a 
256x256  pixel  array,  p  =  .0028/,  corresponds  to  1  pixel. 

For  systematic  errors,  the  direction  and  the  magnitude  p  are  given  as  input  to  the 
simulation  program.  For  these  tests,  was  set  equal  to  y  for  simplicity  and  a  value  was 
selected  for  p  between  0  and  0.025/.  For  uncorrelated  random  errors,  each  vector  was 
determined  independently  by  choosing  Si  uniformly  over  the  interval  [  0, 27r)  and  pi  over  the 
interval  [0,  Rmax]i  where  Rmax  was  supplied  as  an  input  parameter.  Given  that  the  vectors 
are  uniformly  distributed  over  a  disk  of  area  we  thus  have 

E[8vj6vi\  =  Elp'f]  =  ^  (8.199) 

Values  of  Rmax  were  chosen  in  the  simulations  to  give  o  between  0  and  0.02/. 

In  order  to  compare  the  equations  determined  in  Sections  8.2.1  and  8.2.2  for  and 
with  the  estimation  errors  actually  computed  for  data  corrupted  by  random  error,  several 
tests  were  performed  for  different  types  of  motion  and  different  values  of  D  and  N .  The 
results  of  two  series  of  tests,  one  with  b  T  ha  and  the  other  with  b  ||  hs  are  shown  in 
Figures  8-4  and  8-5.  In  both  series  the  rotation  was  given  by  D  =  (0,0, 1)  with  6  =  5°,  so 
that  {>3  =  i,  while  the  translation  for  the  first  series  was  specihed  as  b  =  (1, 0, 0)  and  in  the 
second  as  b  =  (0,0,1).  The  same  set  of  3-D  coordinates  p;,  with  iV  =  50,  2"  =  18.95,  and 
K  =  F[l/Z^]  =  .0062,  was  used  for  all  tests.  For  each  motion,  actual  and  predicted  errors 
for  the  translation  and  the  rotation  were  computed  for  viewing  fields  of  /  =  20°,  /  =  40°, 


Degrees  of  error  Degrees  of  error  Degrees  of  error 
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Baseline  error:  fov  =  20,  N  =  50,  kappa  =  0.006217 


sigma 


Rotation  error:  fov  =  20,  N  =  50 


sigma 


Baseline  error  fovs40,  NsSO,  kappa  =  0.006217  Rotation  error:  fovs40,  NsSO 


sigma  sigma 


Baseline  error  fov  =  60,  N  =  50,  kappa  =  0.006217  Rotation  error:  fov  =  60,  N  =  50 


sigma  sigma 


Figure  8-4:  Estimation  errors  in  translation  and  rotation  with  b  ±  Ds  (uncorrelated  mea¬ 
surement  errors),  b  =  (1,  0, 0),  w  =  (0,0, 1),  0  =  5° 


OK)  KOfqVOOXKSK 


Degrees  of  error  Degrees  of  error  Degrees  of  error 
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Figure  8-5:  Estimation  errors  in  translation  and  rotation  with  b  ||  V3  (uncorrelated  mea¬ 
surement  errors),  b  =  (0,0, 1),  0  =  (0,0, 1),  0  =  5° 


nuj/z  nui 
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Ratio  of  eigenvalues  of  C  with  b  =  (0,0,1),  fov  =  20,  N  =  50  Ratio  of  eigenvaiues  of  C  with  b  =  (0,0,1 ),  fov  =  20,  N  =  100 


Ratio  of  eigenvalues  of  C  with  b  =  (0,0,1),  fov  =  40,  N  =  50  Ratio  of  eigenvalues  of  C  with  b  =  (0,0,1 ),  fov  =  40,  N  =  100 


Ratio  of  eigenvalues  of  C  witti  b  =  (0,0,1),  fov  =  60,  N  =  50  Ratio  of  eigenvalues  of  C  with  b  =  (0,0,1),  fov  =  60,  N  =  100 


Figure  8-7:  Actual  and  predicted  values  of  f.i2/lJ’3  for  b  ||  vs,  for  cj)  =  20°,  40°,  and  60' 
with  iV  =  50  and  N  =  100. 
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and  (f)  =  60°  [D  =  0.364,  D  =  0.840,  and  D  =  1.732). 

Since  the  equations  for  and  86“^  only  predict  expected  values,  a  total  of  16  different 
data  sets  were  used  for  each  test  and  each  value  of  a  in  order  to  obtain  a  statistically 
significant  approximation  of  the  average  error.  The  data  for  each  test  were  generated 
from  the  same  set  of  ideal  correspondences  computed  for  the  given  motion  by  initializing 
the  random  number  generator  used  to  determine  the  with  different  seeds.  The  errors, 
shown  in  Figures  8-4  and  8-5,  are  plotted  as  circles  when  they  were  computed  with  either 
the  rotation  or  the  translation  given,  and  as  asterisks  when  the  full  motion  was  estimated. 
In  many  cases,  particularly  with  (j)  =  20°  and  b  T  hs,  the  estimation  of  the  translation  was 
unstable  in  the  sense  that  the  eigenvector  corresponding  to  the  smallest  eigenvalue  of  C 
was  not  the  one  which  was  closest  to  the  true  translation.  In  these  cases,  the  error  reported 
is  the  smallest  angle  between  the  true  value  of  b  and  either  of  the  other  two  eigenvectors, 
and  is  marked  on  the  graph  with  the  symbol  ‘U’. 

The  average  of  the  computed  errors  agree  well  with  the  predicted  values  in  most  cases. 
There  is  a  considerable  spread  in  the  results  for  both  the  baseline  and  rotation  errors  for 
motions  with  b  1  vs,  however,  this  should  be  expected  from  the  higher  sensitivity  to  error  in 
this  case.  There  is  much  less  spread  in  the  results  when  b  ||  f>3,  and  also  many  fewer  instances 
of  instability,  although  there  does  appear  to  be  an  increase  in  the  spread  of  translation 
estimates  at  larger  fields  of  view.  On  closer  examination,  however,  it  can  be  seen  that  the 
variation  in  this  case  is  mostly  in  the  errors  from  the  full  motion  estimation,  while  errors 
in  computing  the  translation  with  known  rotation  cluster  well  about  the  predicted  average. 
The  reason  for  the  increased  variation  in  the  estimates  from  the  complete  algorithm,  in  the 
case  of  large  viewing  fields  with  b  ||  ha,  is  not  apparent  from  the  first-order  analysis  of  this 
chapter,  although  we  can  conjecture  that  it  is  caused  by  the  nonlinear  terms  which  were 
neglected. 

The  predictions  of  Sections  8.1.1  and  8.1.3  for  the  ratio  of  the  middle  and  largest  eigen¬ 
values  of  C  were  also  tested  for  these  simulated  motions.  In  Figures  8-6  and  8-7  the  ratios 
predicted  from  the  estimated  b  and  hs,  as  well  as  the  actual  values  computed  from  the 
matrix  C  itself,  are  shown  for  each  value  of  a  and  (j).  As  before,  each  test  was  performed 
16  times  with  different  sets  of  randomized  errors  added  to  the  correspondence  data.  The 
dashed  lines  indicate  the  best-fitting  curves  to  the  results  from  the  16  tests  plotted  as  a 
function  of  cr.  It  should  be  noted  the  results  from  all  of  the  tests  are  shown,  including  those 
for  which  the  translation  estimate  was  unstable.  The  actual  ratios  are  always  computed 
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from  the  middle  and  largest  eigenvalues  of  C.  However,  the  predicted  ratio  is  computed 
using  the  eigenvector  closest  to  the  true  baseline  as  the  estimate  of  the  translation. 

The  first  observation  which  can  be  made  based  on  these  tests  is  that  there  is  a  difference 
between  the  predicted  and  actual  values  for  I-I2/U3  when  b  X  1)3  which  increases  with  the 
field  of  view.  This  effect  can  be  attributed  to  the  fact  that  the  value  of  D  =  tan  (j)  was 
used  to  compute  the  predicted  ratio,  while  it  probably  would  have  been  better  to  use 
D/a/2,  which  is  the  average  distance  of  points  from  the  center  of  the  image,  and  which  is 
a  more  appropriate  statistical  measure  of  the  radius  of  the  viewing  field.  With  b  exactly 
perpendicular  to  {>3  the  predicted  ratio  from  equation  (8.23)  is  H2I Pz  —  D'^ /“f-  However 
the  actual  ratios  agree  much  better  with  D^/8  which  for  </>  =  20°,  40°,  and  60°,  would  give 
predicted  ratios  of  .016,  .09,  and  .375,  respectively. 

Since  equations  (8.20)  and  (8.21)  for  the  eigenvalues  of  C  were  derived  by  modeling 
the  discrete  correspondences  as  a  uniform  deirsity  spread  over  the  field  of  view,  tests  were 
performed  with  both  N  -  50  and  N  -  100  to  assess  how  strongly  the  eigenvalue  ratios 
are  affected  by  this  approximation,  which  clearly  depends  on  N.  With  b  X  us  there  is 
not  a  significant  difference  in  the  results  for  the  values  of  N  tested,  however,  there  is  a 
large  difference  for  b  ||  V3.  This  difference  can  be  explained  by  noting  that  the  ratio 
P2/ Pz  is  a  measure  of  the  symmetry  in  the  distribution  of  the  vectors  Ci  =  £'i  x  rj  in  the 
plane  perpendicular  to  b.  The  equality  of  the  two  largest  eigenvalues  in  the  case  b  ||  fis 
predicted  in  equation  (8.25)  is  thus  a  direct  consequence  of  the  uniform  density  assumption. 
Nonetheless,  the  values  of  P2/P3  are  seen  to  be  consistently  higher  when  b  ||  Vs  than  when 
b  X  1)3,  except  in  the  case  of  ^  =  60°  with  N  =  50,  where  the  values  are  similar.  It  can  also 
be  seen  that  the  difference  in  the  ratios  is  largely  unaffected  either  by  errors  in  the  data  or 
by  the  fac  that  the  algorithm  sometimes  converged  to  the  wrong  solution. 

It  can  thus  be  concluded  that  the  ratio  test  is  an  effective  indicator  for  discriminate 
between  correct  and  false  solutions.  However,  in  order  to  implement  the  test  on  a  real 
system,  it  is  necessary  to  first  develop  a.  baseline  profile  of  the  expected  ratios  for  translations 
parallel  and  perpendicular  to  the  image  plane,  rather  than  predicting  their  values  from 
equations  (8.20)  and  (8.21),  since  these  will  depend  on  the  actual  geometry  of  the  sensor 
and  focal  length  of  the  lens,  as  well  as  on  the  average  number  of  correspondence  points 
returned  by  the  matching  procedure. 

Finally,  the  case  of  systematic  measurement  errors  was  investigated  for  b  X  -03  and 
b  II  Vs  with  the  results  shown  in  Figures  8-8  and  8-9  for  (j)  =  20°,  40°,  and  60°.  Since 
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Error  In  baseline  estimate  with  systematic  measurement  error  (rotation  known)  Error  in  baseline  estimate  with  systematic  measurement  error  (rotation  computed) 


Error  In  rotation  estimate  with  systematic  measurement  error  (translation  toiown)  Error  In  rotation  estimate  with  systematic  measurement  error  (translation  computed) 


Figure  8-8:  Errors  in  estimates  of  translation  and  rotation  with  b  ±  1)3  (Systematic 
measurement  errors),  b  =  (1,0,0),  d)  =  (0,0, 1),  9  =  5°. 
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Error  In  baseline  estimate  with  systematic  measurement  error  (rotation  known)  Error  In  baseline  estimate  with  systematic  measurement  error  (rotation  computed) 


Error  in  rotation  estimate  with  systematic  measurement  error  (translation  known)  Error  In  rotation  estimate  with  systematic  measurement  error  (translation  computed) 


Figure  8-9:  Errors  in  estimates  of  translation  and  rotation  with  b  ||  vs  (Systematic  mea¬ 
surement  errors),  b  =  (0,0,1),  a)  =  (0,0,1),^  =  5°. 
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the  errors  are  deterministic,  it  was  not  necessary  to  make  numerous  tests,  as  in  the  case  of 
random  data,  and  so  the  results  of  only  one  simulation  are  shown  for  each  motion  and  field 
of  view. 

It  can  be  seen  that  the  actual  errors  agree  well  with  those  predicted  by  equations  (8.136), 
(8.171)  and  (8.174).  For  translation  with  known  rotation,  the  errors  are  essentially  inde¬ 
pendent  of  the  field  of  view  for  both  b  ±  hs  and  b  ||  ha,  although  there  is  a  slight  difference 
in  both  cases  for  (f)  —  20°.  For  the  estimates  of  rotation  with  known  translation,  the  errors 
do  increase  as  6  decreases  when  b  ±  ha,  but  are  independent  of  (j)  when  b  ||  ha. 

When  the  full  motion  is  computed,  the  errors  in  the  rotation  scarcely  change  from  those 
computed  with  the  translation  known.  Errors  in  the  translation  estimates,  however,  are 
reduced  drastically.  With  b  F  ha,  the  error  is  removed  almost  entirely  at  ^  =  20°  and  is 
only  slightly  worse  at  larger  fields  of  view.  With  b  ||  ha,  on  the  other  hand,  the  error  is 
essentially  removed  at  all  values  of  4>. 

As  previously  noted,  the  fact  that  one  can  obtain  very  accurate  estimates  of  the  transla¬ 
tion  from  the  complete  algorithm  when  the  measurement  errors  are  correlated,  is  significant 
as  it  implies  that  accurate  internal  camera  calibration  is  not  required  to  obtain  useful  infor¬ 
mation  for  many  applications,  such  as  passive  navigation,  that  do  not  require  highly  precise 
estimates  of  the  rotation. 


Chapter  9 


Summary  of  Part  I 


Many  issues  have  been  covered  in  the  preceding  chapters,  and  it  is  useful  to  summarize 
the  major  results  and  conclusions. 

In  the  architecture  outlined  in  Chapter  4,  it  was  determinined  that  processing  in  the 
motion  system  should  be  divided  into  three  stages: 

•  Edge  detection, 

•  Feature  matching  by  block  correlation  of  the  binary  edge  maps,  and 

•  Solving  the  motion  equations. 

Edge  detection  would  be  performed  directly  on  the  analog  signals  acquired  by  the  pho¬ 
tosensors  using  a  fully  parallel  analog  array  processor  implementing  the  multi-scale  veto 
algorithm.  In  Chapter  5,  this  edge  detection  algorithm  was  shown  to  have  the  following 
advantages  over  classical  methods: 

•  There  is  no  tradeoff  between  edge  localization  and  smoothing.  Noise  and  unwanted 
minor  features  can  be  effectively  removed  by  adjusting  the  sequence  of  thresholds 
applied  at  each  smoothing  cycle.  The  edges  which  are  detected,  however,  remain  lo¬ 
calized  at  the  positions  of  the  features  in  the  original  image  regardless  of  the  thresholds 
used. 

•  The  method  does  not  require  computing  second  differences  or  searching  for  zero- 
crossings.  Hence  the  circuitry  is  much  simpler,  and  all  processing  is  local. 
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•  The  algorithm  is  designed  to  take  advantage  of  the  signal  processing  capabilities  of 
CCDs.  It  can  thus  be  efficiently  implemented  on  a  CCD  array  vi^ith  circuitry  placed 
at  each  pixel  to  compute  differences  and  store  edge  signals. 

At  the  end  of  Chapter  3,  it  was  concluded  that  the  most  appropriate  method  for  ob¬ 
taining  the  correspondence  points  needed  by  the  motion  algorithm  was  a  block-correlation 
procedure  using  the  binary  edge  maps  produced  in  the  first  processing  stage.  Due  to  the 
presence  of  repeating  patterns  which  occur  naturally  in  real  scenes,  however,  similarity  mea¬ 
sures  alone  cannot  determine  the  best  matches  and  achieve  an  acceptably  low'  false-alarm 
rate.  In  Chapter  6,  a  series  of  tests  was  developed  to  add  to  the  correlation  procedure 
in  order  to  minimize  the  error  rate.  These  tests  included  rejecting  matches  from  blocks 
which  have  too  few  or  two  many  edge  pixels  to  give  a  low  probability  of  a  false  match,  and 
rejecting  altogether  blocks  for  which  there  were  multiple  possible  matches. 

Chapter  7  covered  the  development  of  the  algorithm  to  estimate  motion  from  the  set  of 
point  matches  found  in  stage  2.  Due  to  the  complexity  and  nonlinearity  of  the  equations 
involved,  it  was  determined  that  the  motion  algorithm  should  be  executed  on  a  standard 
digital  processor.  Nonetheless,  given  the  goal  of  building  a  low-power  system,  it  was  neces¬ 
sary  to  simplify  the  algorithm  as  much  as  possible  so  that  minimal  processing  power  would 
be  required.  It  was  shown  that  by  alternating  the  procedures  for  updating  the  baseline 
and  the  rotation,  the  complexity  of  the  operations  could  be  considerably  reduced  such  that 
the  most  complex  computation  would  be  to  solve  a  3x3  eigenvalue-eigenvector  equation 
at  each  iteration.  Simulations  of  the  algorithm  on  several  image  sequences  using  the  point 
correspondences  determined  by  the  edge  detectioir  algorithm  and  the  matching  procedure 
showed  that  these  simple  methods  developed  for  efficient  implementation  in  VLSI  were 
as  effective  for  obtaining  accurate  estimates  of  the  motion  as  more  complex  procedures 
commonly  implemented  in  software. 

Finally,  we  could  not  build  a  robust  system  for  computing  motion  without  studying  the 
effects  of  measurement  errors  on  the  estimates,  and  how  these  effects  are  related  to  the  type 
of  motion  and  scene  structure,  as  well  as  to  the  spatial  resolution  of  the  image  sensor,  the 
field  of  view,  and  the  number  of  correspondence  points  found.  In  Chapter  8,  the  numerical 
stability  of  the  motion  algorithm  was  thoroughly  analyzed  and  expressions  were  derived  for 
the  expected  estimation  error  in  the  cases  of  both  random  and  systematic  measurement 
errors.  Three  important  results  were  obtained  from  this  analysis: 
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•  A  test  was  developed  for  reliably  determining  when  the  motion  algorithm  converges 
to  an  incorrect  solution  based  on  the  ratio  of  the  two  largest  eigenvalues  of  the  matrix 
C,  defined  in  equation  (7.6). 

•  Design  guidelines  were  developed  for  building  a  system  with  the  appropriate  sensor 
resolution,  field  of  view,  and  number  of  matching  circuits  to  achieve  a  given  maximum 
expected  estimation  error. 

•  It  was  also  discovered  that  precise  internal  camera  calibration  was  not  required  to 
obtain  accurate  estimates  of  the  translation,  provided  that  the  rotation  is  estimated 
as  well.  This  significant  result  implies  that  applications  such  as  navigation  which 
require  accurate  knowledge  of  the  translation  direction,  but  which  are  less  sensitive 
to  errors  in  the  rotation,  can  implement  the  motion  system  without  also  needing 
sophisticated  calibration  procedures. 


Part  II 

Design  of  a  CCD-CMOS 
Multi-Scale  Veto  Chip 


( 
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Chapter  10 


Basic  Requirements 

The  plan  for  the  design  of  the  multi-scale  veto  (MSV)  edge  detector  was  outlined  in 
Chapter  5.  A  two-dimensional  CCD  array  as  shown  in  Figure  5-3  is  ideally  suited  for 
performing  the  successive  smoothing  operations  required  by  the  algorithm.  The  remaining 
tasks  of  computing  the  differences  between  the  smoothed  brightness  values  at  neighboring 
pixels  and  testing  if  the  magnitude  of  these  differences  is  above  a  given  threshold,  is  then 
performed  by  additional  circuitry  placed  between  each  pair  of  nodes  within  the  array. 

Several  considerations  were  important  in  determining  the  design  of  the  different  elements 
in  the  MSV  edge  detection  processor,  of  which  one  of  the  major  concerns  was  the  silicon 
area  required  for  each  pixel.  Since  the  number  of  pixels  in  the  image  array  directly  impacts 
the  robustness  of  the  motion  estimates  which  can  be  derived  from  the  edge  maps,  it  is 
important  that  each  cell  in  the  array  be  as  small  as  possible.  One  of  the  ways  to  reduce  the 
per-pixel  area,  which  is  already  incorporated  in  the  design,  is  by  using  time  as  a  dimension. 
Since  the  threshold  tests  are  performed  sequentially  at  the  end  of  each  smoothing  cycle, 
only  one  difference  and  test  circuit  is  required  for  each  node  pair.  However,  this  also  means 
that  space  is  needed  to  store  the  intermediate  results  and  that  the  internal  circuits  must  be 
fast  enough  to  complete  all  of  the  tests  within  the  time  allotted  for  processing  the  image. 

Small  area  implies  simple  circuits.  However,  if  the  edge  detector  is  to  produce  useful 
results,  the  need  for  simplicity  cannot  compromise  the  resolution  requirements  of  the  algo¬ 
rithm.  The  attenuation  factors  for  several  idealized  image  features  were  given  in  Table  5.1 
as  a  function  of  the  number  of  smoothing  cycles  performed.  If  equation  (5.8)  is  used  to 
compute  the  thresholds  Tk  for  each  smoothing  cycle  k  =  0,...,n,  the  internal  difference 
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and  test  circuits  must  be  able  to  resolve  differences  as  large  as  tq,  the  initial  threshold,  and 
as  small  as  r„  =  G'„jro,  where  is  the  attenuation  of  the  chosen  model  feature  after  n 
smoothing  cycles. 

To  translate  this  requirement  into  a  percentage  of  the  full  scale  range  (FSR),  we  can 
take  a  specific  example  with  n  =  5  using  the  horizontal  step  edge  as  the  model  feature.  In 
grayscale  images  with  normal  contrast,  tq  is  usually  set  at  around  10%  of  FSR,  and  from 
Table  5.1,  the  attenuation  factor  for  the  horizontal  step  edge  after  5  cycles  is  seen  to  be 
Gs  =  0.246.  The  range  of  distinguishable  differences  must  thus  be  between  10%  and  2.5% 
of  FSR,  or  in  terms  of  bits  of  precision,  between  4  and  5  bits.  If  either  the  diagonal  step 
edge  or  the  horizontal  2-pixel  hne  is  used  as  the  model,  the  resolution  requirement  jumps 
to  between  4  and  6  bits. 

Designing  a  small  absolute- value-of-difference  circuit  with  this  much  resolution  has  been 
one  of  the  major  challenges  in  building  a  working  MSV  chip.  In  the  smoothing  and  segmen¬ 
tation  chip  designed  by  Keast  [86],  a  CCD-based  absolute  value  of  difference  circuit  was 
used,  primarily  because  of  its  small  size  with  respect  to  a  transistor-based  design.  A  sim¬ 
ilar  structure  was  also  included  in  the  stereo  disparity  chip  designed  by  Hakkarainen  [65]. 
Unfortunately,  for  reasons  which  Keast  discovered  and  which  will  be  discussed  in  the  next 
chapter,  the  CCD  circuit  has  a  ‘dead  zone’  for  small  differences  which  limits  its  resolution 
to  less  than  25%  of  FSR.  It  was  thus  necessary  to  design  a  new  transistor-based  absolute- 
value-of-difference  circuit  occupying  the  least  area  possible. 

Given  the  above  constraints,  a  32x32  prototype  array  implementing  the  MSV  algorithm 
was  designed  and  fabricated  through  MOSIS  using  the  Orbit  2/rm  CCD/CMOS  process. 
The  next  three  chapters  are  devoted  to  discussing  the  design  and  testing  of  this  array  and  to 
analyzing  the  changes  required  to  build  a  full-size  (~256x256)  processor,  possibly  operating 
as  an  image  sensor  as  well  as  an  edge  detector.  In  order  to  clarify  aspects  of  the  design 
involving  CCDs,  the  following  chapter  describes  the  basic  physics  of  charge  storage  and 
transfer  and  the  input  and  output  structures  used  to  interface  with  CCD  arrays.  Chapter  12 
covers  in  detail  the  design  of  each  of  the  major  components  in  the  MSV  array,  and  finally. 
Chapter  13  describes  the  test  system  and  results. 


Chapter  11 


Charge  Coupled  Device  Fundamentals 


Charge  coupled  devices  are  based  on  the  principle  that  a  charge  packet  may  be  confined 
within  a  potential  well  created  by  applying  a  voltage  to  a  polysilicon  gate  and  may  be  moved 
from  one  location  to  another  by  appropriately  manipulating  the  gate  voltages.  Conceptually, 
it  is  useful  to  think  of  CCDs  as  buckets  and  of  the  signal  charge  as  water.  The  process  of 
charge  transfer  is  similar  to  that  of  moving  the  water  by  placing  the  buckets  on  risers,  as 
shown  in  Figure  11-1,  with  tubes  connecting  the  bases  of  neighboring  buckets.  The  water 
levels  in  adjacent  buckets  are  determined  by  the  relative  heights  of  the  risers,  just  as  charge 
levels  under  adjacent  gates  of  a  CCD  are  determined  by  the  relative  difference  in  the  well 
potentials. 

The  physics  of  CCD  operation  is  of  course  considerably  different  from  that  of  the  bucket 
brigade.  CCDs  exist  in  two  forms:  surface  channel  and  buried  channel  devices.  Both 
operate  on  the  same  principle  of  charge  transfer;  however,  their  physical  structure  and 
device  characteristcs  are  very  different.  It  is  useful  to  examine  both  structures  in  order  to 
understand  the  advantages  and  limitations  of  each. 

11.1  Surface  Channel  Devices 

The  simplest  form  of  CCD,  which  is  the  surface-channel  device,  is  constructed  from  a 
series  of  adjacent  MOS  capacitors  operating  in  deep- depletion  mode.  Figure  11-2  shows  the 
typical  structure  of  a  MOS  capacitor  formed  by  sandwiching  a  thin  layer  of  oxide,  Si02, 
between  a  polysilicon  gate  and  a  p-type  semiconductor  substrate.  When  a  voltage  greater 
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than  the  flatband  voltage  V fh  is  applied  to  the  gate,  the  substrate  is  depleted  of  majority 
carriers  and  a  depletion  layer  is  formed  as  shown  in  Figure  ll-3a.  If  the  gate  voltage  is 
raised  above  the  threshold  voltage  Vj  of  the  material,  defined  as  the  point  at  which  strong 
inversion  occurs  [92],  minority  electrons  are  attracted  to  the  Si-Si02  interface  and  the 
depletion  region  ceases  to  increase  in  depth  (Figure  ll-3b). 

In  order  for  the  inversion  channel  to  form,  there  must  be  a  supply  of  available  electrons. 
In  an  NMOS  transistor,  electrons  in  the  conducting  channel  are  supplied  by  the  metal 
contact  made  to  the  source  diffusion.  In  the  MOS  capacitor,  the  electrons  must  come 
from  the  substrate  where  they  are  produced  by  thermal  generation  of  electron-hole  pairs. 
Thermal  generation  in  the  bulk  results  in  a  flow  of  electrons  from  the  substrate  to  the  high 
potential  region  at  the  surface,  known  as  dark  current.,  which  continues  until  equilibrium 
conditions  are  obtained.  In  a  well-designed  process,  however,  the  dark  current  density,  Jd, 
is  typically  <  InA/cm^.  At  this  level,  the  time  required  for  the  device  to  reach  equilibrium 
is  on  the  order  of  minutes  [86]. 

CCDs  exploit  this  long  equilibration  time  to  perform  useful  signal  processing  tasks. 
When  Vg  is  raised  above  Vt,  the  depletion  region  initially  extends  beyond  its  maximum 
equilibrium  depth,  as  shown  in  Figure  ll-3c.  This  is  the  condition  known  as  deep  depletion. 
Signal  charge  may  be  introduced  into  the  device  either  optically  or  electrically  and  will  be 
confined  to  the  potential  well  until  its  maximum  capacity  is  reached.  The  maximum  charge 
which  can  be  held  is  equal  to  the  channel  charge  of  the  capacitor  at  equilibrium  and  is 
a  linear  function  of  the  applied  gate  voltage.  By  placing  gates  connected  to  independent 
voltages  adjacent  to  one  another,  the  signal  charge  may  be  transferred  between  gates  just 
as  the  water  is  moved  in  the  bucket  brigade. 
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The  primary  advantage  of  surface  channel  devices  is  the  linear  relation  between  Vg 
and  Qmax,  the  maximum  signal  charge.  These  devices  suflFer,  however,  from  poor  transfer 
efficiency  due  to  the  high  number  of  interface  states  at  the  Si-Si02  boundary  which  can 
trap  charge  and  then  release  it  at  a  random  time  afterward.  Trapping  results  in  both  noise 
and  signal  degradation  when  the  charge  must  be  trainsferred  through  a  long  series  of  gates. 
Consequently,  surface  channel  devices  are  never  used  in  the  design  of  large  high-quality 
sensors.  Their  chief  application  is  in  performing  operations  where  the  linearity  of  signal 
charge  with  gate  voltage  is  important  and  where  only  a  few  gates  are  needed. 

11.2  Buried  Channel  Devices  (BCCDs) 

Like  the  surface  channel  device,  the  buried  channel  CCD  is  also  a  non-equilibrium  struc¬ 
ture.  In  the  BCCD,  however,  the  signal  charge  is  held  away  from  the  Si-Si02  interface  be¬ 
cause  the  potential  maximum  occurs  inside  the  channel,  several  hundred  nanometers  below 
the  surface.  The  buried  channel  is  created  by  adding  an  7r-doped  implant  below  the  transfer 
gates.  Electrical  contact  is  made  to  the  channel  at  n-f-  diffusions  placed  at  the  extremities 
of  the  gate  array,  and  when  a  sufficiently  large  positive  voltage  is  applied  with  respect  to 
the  substrate,  the  buried  layer  is  completely  depleted  of  majority  carriers. 

The  potential  profile,  (p{x),  with  depth  in  the  semiconductor  typically  resembles  the 
curves  shown  in  Figure  11-4  [93],  with  the  dotted  and  solid  lines  representing  the  profiles 
with  and  without  signal  charge,  respectively.  Here  x  represents  depth  below  the  oxide  layer, 
with  X  =  0  at  the  Si-Si02  interface.  As  seen  in  the  diagram,  the  addition  of  the  buried  layer 
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Figure  11-4:  Potential  profile  with  depth  of  a  buried  channel  CCD 


creates  a  non-monotonic  profile  such  that  the  maximum  potential  (pmax  resides  at  a  distance 
Xmax  below  the  interface.  If  electrons  are  injected  into  the  layer,  they  will  accumulate  near 
^max  rather  than  at  the  surface. 

The  BCCD  structure  may  be  modeled  as  a  series  connection  of  lumped  capacitances,  as 
illustrated  in  Figure  11-5.  Cox  represents  the  oxide  capacitance,  €ox/toxi  while  Cdi  and  Cd2 
represent  the  depletion  capacitances  between  the  channel  and  the  oxide  and  the  channel  and 
the  substrate,  respectively.  The  signal  charge  occupies  a  finite  width,  Wch^  which  cannot 
be  neglected  in  computing  the  values  of  Cd\  and  Cd2-  Conventionally,  this  width  is  divided 
equally  between  the  two  depletion  capacitances  [92],  resulting  in  the  following  expressions: 


and 


with 


Cd^  = 

€Si 

(11.1) 

i^^max  W  chl'^) 

Cd2  = 

^Si 

(11.2) 

{_^ch—b  ~f"  ^^c/i/2) 

^ch—b 

—  “1”  ^max 

(11.3) 

The  effective  capacitance  Cejj  between  the  signal  charge  and  the  gate  is  the  series  combi¬ 
nation  of  Cox  and  Cdi  ■ 
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Vg 


Figure  11-5:  Lumped  capacitance  model  of  a  BCCD 


—  +  — 

C'ox  C(i\ 


(11.4) 


We  can  immediately  see  two  effects  of  the  buried  channel.  The  first  is  that  the  signal 
carrying  capacity  is  less  than  that  of  a  surface  channel  device  due  to  the  decrease  in  the 
effective  capacitance  caused  by  moving  the  charge  away  from  the  surface.  Second,  the 
effective  capacitance  depends  nonlinearly  on  both  the  amount  of  charge  present  and,  as  will 
be  seen  shortly,  on  the  gate  voltage. 

Nonetheless,  buried  channel  devices  offer  significant  advantages  in  charge  transfer  effi¬ 
ciency.  By  keeping  the  signal  charge  away  from  the  SPSi02  surface,  the  interaction  of  the 
charge  packet  with  traps  at  the  interface  is  essentially  eliminated.  Although  bulk  traps  also 
occur,  they  are  much  less  frequent  than  those  at  the  interface,  and  from  a  process  stand¬ 
point,  it  is  much  easier  to  reduce  the  bulk  state  density  than  that  of  interface  states  [92]. 
Furthermore,  the  reduced  effective  capacitance  between  the  signal  charge  and  the  gate  in¬ 
creases  the  fringing  fields,  which  are  the  dominant  driving  force  in  the  final  stages  of  charge 
transfer,  as  will  be  seen  in  Section  11.4.  BCCDs  are  thus  the  structure  of  choice  for  large 
array  sensors. 

In  order  to  use  a  buried  channel  CCD  for  signal  processing,  we  must  first  understand  the 
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relationships  between  gate  voltage  Vg,  signal  charge  Qsig  {coulombs/ and  the  channel 
potential  4>{x).  From  this  we  can  compute  the  magnitude  of  the  lateral  potential  barrier 
between  two  adjacent  gates  held  at  different  voltages,  and  hence  the  maximum  signal  charge 
per  unit  area  which  can  be  confined.  This  information  is  necessary  in  order  to  determine 
the  design  parameters  for  the  transfer  gates,  as  well  as  for  the  charge  input  and  output 
structures. 

Under  the  depletion  approximation,  the  charge  density  profile  of  a  buried  channel  de¬ 
vice  at  gate  voltage  Vg  containing  Ne  electrons / is  as  shown  in  Figure  11-6  [93].  We 
use  Nd  to  denote  the  doping  concentration  [donors/ jim^)  of  the  buried  channel  and  Na 
{ acceptors /fim/)  to  denote  that  of  the  p-type  substrate.  The  signal  charge  distributes  itself 
over  a  finite  width  Wch  =  N^/Nd  due  to  the  attraction  of  the  negatively- charged  electrons 
to  the  positively-charged  fixed  donor  ions  in  the  lattice.  The  depth  of  the  charge  packet, 
which  ends  at  x  =  x„iax->  is  limited  by  the  depletion  region  created  by  the  junction  between 
the  ?i-type  implant  and  the  p-type  substrate.  The  widths  of  the  space  charge  regions  on 
either  side  of  the  junction  are  related  by  the  charge  balance  equation 


N —  ND[Xn  Xfjiaj:)  (11.5) 

where  Xp  represents  the  extent  of  the  depletion  region  into  the  substrate. 

The  potential  profile  ^(x)  is  obtained  by  integrating  Poisson’s  equation 


d^4>  _  p 
dx'^  esi 


(11.6) 


in  each  region  of  constant  charge  density  shown  in  Figure  11-6.  Within  these  four  regions, 


Poisson’s  equation  is  given  by 


d'^cj) 

dx'^ 

d'^ej) 

dx^ 

d/(f) 

dx'^ 

dV 

dxP 


gNp 

<^Si 

0, 

qNp 

(Si 

(Si 


0  ^  •'C  ^  Xyyiax  f'Uc/l 
Xirnax  hi c/j  ^  ®  ^  Xi^nax 
Xmax  ^  X  ^  X^i 
Xfi  ^  X  X.fi  -j-  Xp 


(11.7) 

(11.8) 

(11.9) 

(11.10) 
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Figure  11-6:  Charge  density  profile  of  a  buried  channel  CCD  with  signal  charge  Qsig 
(using  depletion  approximation) 


The  constants  of  integration  are  determined  by  the  conditions  of  continuity  of  4>(x)  at 
the  region  boundaries  and  the  continuity  of  the  electric  displacement  field  at  a;  =  0  and 
X  =  Xn  Xp.  These  conditions  are  stated  as: 


<^(0) 

4’i.^n  T 


-(Si 


d(j) 

dx 


aT=0 


-^si 


d(f) 

dx 


OOn+Xp 


Referring  to  Figure  11-4,  we  see  that 


4^s 

4*max 

0 

(qxEqx 

0 


(I>S  —  Vg  —  Vfb  +  Vo:c 


(11.11) 

(11.12) 

(11.13) 

(11.14) 

(11.15) 


(11.16) 
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and  since  Vox  =  7-?^ (fixed  charge  density  +  mobile  charge  density)  [93],  it  follows  that 

Vox  =  -^{N DXmax  “  ) 

OX 


=  ^ND{x„rax  -  Woh)  (11-17) 

^ox 

The  electric  field  across  the  oxide  is  given  by 

Eox  =  =  -  —  NoiXmax  -  W^h)  (11.18) 

^■ox  ^ox 

Direct  integration  of  equations  (11. 7)-(  11.10)  applying  the  constraints  (11.13)  and  (11.14) 
results  in 


(t>{x)  = 

-^(x-  (Xxaax  -  Woh)f  +  4>max, 

'2(si 

0  ^  ^  ^max  ~~ 

(11.19) 

4>{x)  = 

4^max  1 

^max  ch  ^  ^  —  ^max 

(11.20) 

4>(x)  = 

ix  X  )^  +  6 

V**'  *^max )  *  'rmax^ 

^max  ^  ^  ^ 

(11.21) 

(l){x)  = 

.,2 

^p)  y 

(11.22) 

From  equations  (11.11),  (11.16),  (11.17),  and  (11.19)  we  can  obtain  a  first  equation  for 

4^max' 

=  D,  -  Vfk  +  ^NDiXmax  -  Woh)  +  '^(Xr,^ax  -  W^hf  (11.23) 

^ox 

while  a  second  equation  may  be  found  by  equating  the  potential  across  the  n-p  junction  at 
X  =  Xn  and  combining  equations  (11.5),  (11.21),  and  (11.22): 


(11.24) 


Equating  the  righthand  sides  of  equations  (11.23)  and  (11.24)  we  obtain  a  quadratic 
equation  which  can  be  solved  for  x^ax-  Given  x^ax  we  can  find  4>maxi  and  therefore  4>{x), 
everywhere  within  the  silicon. 
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The  maximum  number  of  electrons  per  unit  gate  area  which  can  be  held  in  the  channel 
is  given  by: 

Ne^max  —  N DX^aax  (11.25) 

Of  more  interest,  however,  is  the  maximum  number  which  may  be  confined  under  one 
gate  that  is  at  a  higher  voltage  than  its  neighboring  gates.  Charge  is  trapped  as  long 
as  the  channel  potential  under  one  gate  is  higher  than  that  of  its  neighbors.  As  seen  from 
equation  (11.23),  (jimax  is  a  decreasing  function  oiWch  =  Ne/No-  The  maximum  value  of  N^ 
is  therefore  the  one  for  which  (f>max  is  equal  to  the  channel  potential  under  the  neighboring 
gate. 

Let  Vgi  denote  the  voltage  on  the  neighboring  gate,  which  has  zero  signal  charge, 
{Wch  =  0),  and  let  Vg2  denote  the  voltage  on  the  gate  containing  charge  -qNe,max-  From 
equation  (11.23),  we  obtain 

4>maxl  =  Vgl  “  F/fc  +  ^-^N^Xmaxi  +  ^laaxi  (11.26) 

^ox 

and 


Vfb  +  ^Nd  (xmax2 

€ox  V 


Xmax2 


(11.27) 


Equating  the  righthand  sides  of  the  above  expressions  and  noting  that  equal  4>max  implies 
equal  x^ax^  as  seen  from  equation  (11.24),  we  obtain 


=  qN 


€,max 


qN  e^rnax 


(11.28) 


Although  it  appears  from  this  equation  that  Ne,max  is  linearly  related  to  the  difference 
in  gate  voltages,  it  should  be  remembered  that  the  depletion  capacitance  Cd\  depends  both 
on  the  size  of  the  signal  charge  packet  and  on  the  value  of  Vgi  through  its  influence  on  x^ax- 
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11.3  Charge  Transfer  and  Clocking 

The  primary  usefulness  of  CCDs  is  derived  from  their  ability  to  move  signal  charge  from 
one  location  to  another.  Raising  the  voltage  on  one  gate  and  lowering  that  on  a  neighboring 
gate  shifts  the  position  of  the  potential  well  and  also  moves  any  charge  contained  in  it.  By 
appropriately  sequencing  the  gate  voltages,  it  is  not  only  possible  to  transfer  charge  through 
a  long  array,  but  also  to  perform  arithmetic  operations,  such  as  adding  charge  packets  or 
dividing  one  packet  into  several  others.  For  now  we  will  focus  on  the  clocking  strategies 
for  charge  transfer,  since  other  operations  are  based  on  permutations  of  the  fundamental 
transfer  sequence. 

For  best  transfer  efficiency,  there  should  be  little  or  no  spacing  between  adjacent  gates 
so  that  the  lateral  potential  in  every  portion  of  the  channel  is  always  directly  controlled.  In 
order  to  maintain  electrical  isolation,  two  layers  of  polysilicon  separated  by  a  thin  oxide  are 
generally  used  for  alternating  gates.  The  clocking  sequence  for  charge  transfer  depends  on 
the  number  of  independent  clock  sigirals  connected  to  the  gates.  CCDs  have  been  designed 
using  two-,  three-,  and  four-phase  clocking  schemes  [94].  The  two-  and  three-phase  methods 
have  the  advantage  of  using  fewer  clocks  and  allowing  higher  device  density  than  four- 
phase  clocking.  Two-phase  clocking,  however,  requires  a  special  implant  and  only  allows 
charge  transfer  in  one  direction  [92].  Three-phase  clocking  allows  bi-directional  transfer 
but  necessitates  connecting  the  same  clock  signals  to  different  polysilicon  layers,  making 
it  impossible  to  adjust  the  signals  to  overcome  threshold  mismatches  between  first-  and 
second-level  poly.  Since,  as  will  be  seen  in  the  following  chapters,  the  operations  required 
by  the  MSV  algorithm  necessitate  bi-directional  charge  transfer  as  well  as  adjusting  for 
threshold  mismatches,  a  four-phase  method  was  used  in  this  design. 

The  charge  transfer  sequence  using  four-phase  clocking  is  illustrated  in  Figure  11-7.  At 
the  beginning  of  the  sequence,  the  signal  charges  are  held  under  the  gates  labeled  cj)i  and 
(f)2.  In  the  next  clock  cycle,  the  signal  cf)^  is  brought  high  as  <f)i  is  brought  low,  causing  the 
charge  packets  to  spill  into  the  empty  potential  wells  created  under  the  gates  connected  to 
(p3  and  move  away  from  the  lower  potential  regions  now  under  (?ii.  In  the  following  cycles 
the  process  is  repeated  by  raising  and  lowering  the  pairs  <t>4~(l)2,  and  </>2-</>4,  at  which 

point  the  sequence  repeats. 
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<j)l  <j)2  (l)^  ^4  <l>j  <1>2 


^1  ^2  ^3  ^4  ^1  ^2 


Figure  11-7:  Charge  transfer  sequence  with  four-phase  clocking. 
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11.4  Transfer  Efficiency 

One  of  the  most  important  characteristics  of  charge-coupled  devices  is  their  charge 
transfer  efficiency c,  defined  as  the  fraction  of  the  total  charge  transferred  to  the  receiving 
well  in  one  clock  cycle.  A  related  quantity,  the  transfer  inefficiency  i],  given  by 

7?=l-£  (11.29) 

is  the  fraction  of  total  charge  left  behind. 

In  order  for  large  CCD  arrays  to  be  useful  for  analog  processing,  transfer  efficiencies 
greater  than  e  =  0.99999  are  required.  The  consequences  of  poor  transfer  efficiency  can  be 
easily  understood  from  a  simple  calculation.  After  transferring  a  packet  of  initial  size  Qo 
through  N  gates,  the  final  packet  size,  Qn  is  given  by 

Qn  =  (11-30) 

With  N  =  1000  and  e  =  0.99999,  Qn/Qo  =  0.99,  implying  that  by  the  time  the  charge 
packet  reaches  the  end  of  the  array,  the  signal  will  be  diminished  by  1%  of  its  original  value. 
With  €  =  0.9999,  however,  we  would  have  Qn/Qo  =  0.90,  resulting  in  a  10%  loss  in  the 
signal. 

Unless  there  is  recombination  in  the  channel,  which  should  not  occur  to  a  significant 
extent  in  a  well-designed  device,  the  charge  is  not  actually  lost  but  is  dispersed  over  the 
array,  causing  successive  packets  to  be  contaminated  by  the  charge  left  behind.  It  can  easily 
be  seen  that  the  amount  of  charge  left  under  gate  i,  counting  from  f  =  0  initially,  is  given 
by  the  binomial  formula  [93] 

U(l-c)^-'  (11.31) 

There  are  three  primary  mechanisms  causing  poor  transfer  efficiency.  The  first  is  charge 
trapping  by  interface  or  bulk  states.  As  previously  explained,  one  of  the  primary  advantages 
to  using  buried  channel  devices  is  the  fact  that  the  density  of  bulk  states  is  much  lower 
than  that  of  the  interface  states  at  the  Si-Si02  surface,  resulting  in  much  higher  transfer 
efficiencies  in  BCCDs  than  in  surface  channel  devices. 


Qo  y  f 
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The  second  cause  of  poor  charge  transfer  is  a  ‘bumpy’  channel.  The  lateral  potential 
profile  between  the  gate  spilling  the  charge  packet  and  the  gate  receiving  the  packet  does 
not  necessarily  increase  monotonically.  Potential  ‘bumps’,  which  can  keep  some  amount 
of  charge  from  reaching  the  neighboring  well,  can  occur  when  the  inter-gate  spacing  is 
too  large,  when  there  are  changes  in  the  channel  width,  or  when  there  are  corners  in  the 
channel  [86].  In  addition  since,  as  seen  from  equation  (11.23),  Smax  is  directly  related  to 
tox,  potential  bumps  can  be  caused  by  variations  in  the  oxide  thickness  over  the  length  of 
the  transfer  gate. 

The  third  mechanism  affecting  charge  transfer  is  clock  frequency.  Intuitively,  it  is  clear 
that  for  the  charge  to  be  completely  transferred,  it  has  to  be  allowed  enough  time  to  reach 
its  destination.  Quantitatively,  the  time  needed  for  complete  charge  transfer  in  the  absence 
of  traps  can  be  determined  by  analyzing  the  forces  driving  the  charge  packet. 

When  the  empty  potential  well  is  created  next  to  the  charge  packet  by  raising  the  voltage 
on  the  neighboring  gate,  the  initial  force  pushing  the  charge  into  the  new  well  is  the  mutual 
repulsion  of  the  negatively  charged  electrons  which  generates  a  self-induced  field,  Egi.  As 
the  electron  concentration  decreases  in  the  lower  potential  region  being  emptied,  the  self- 
induced  field  becomes  less  important  and  the  fringing  field,  Ejf,  becomes  dominant  [92], 
[93].  The  fringing  field  is  created  by  the  lateral  influence  of  the  neighboring  gates  which 
are  at  a  high  voltage  on  the  charge  remaining  in  the  emptying  well.  Because  this  influence 
increases  as  the  effective  capacitance,  Cg//,  between  the  charge  and  the  gate  directly  above 
it  decreases,  the  fringing  fields  are  larger  in  buried  channel  than  in  surface  channel  devices, 
further  enhancing  their  transfer  efficiency.  The  final  stage  of  charge  transfer,  after  the  self- 
induced  and  fringing  fields  have  become  negligible,  is  dominated  by  thermal  diffusion  which 
is  the  slowest  of  the  three  transport  mechanisms  [92]. 

Determining  the  exact  time- varying  charge  distribution  during  transfer  requires  numer¬ 
ical  solutions  to  the  time-dependent  Poisson’s  equation  and  continuity  conditions.  Approx¬ 
imate  solutions,  however,  have  been  derived  in  references  [93]  and  in  [95].  Citing  the  results 
from  the  first  reference,  which  give  somewhat  more  insight  into  the  nature  of  the  solution, 
the  transfer  inefficiency  after  time  t  due  to  self-induced  drift  is 


Vsiit) 


1 


r'w' 


1  +  t/tgi 


(11.32) 
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where 

(11.33) 

T'PnQo 

with  L  being  the  length  of  the  transfer  gate  and  the  electron  mobility  in  the  channel. 
The  transfer  inefficiency  due  to  fringe-field  drift  after  time  t  is  given  by 


(11.34) 


^  2u  \E  ■  \ 

The  minimum  strength  of  the  fringing  field,  \Emin\-,  is  given  by 


(11.35) 


(11.36) 


where  NVg  is  the  difference  in  the  gate  voltages  on  the  discharging  and  receiving  gates. 

From  equations  (11.33),  (11.35),  and  (11.36)  it  is  clear  that  the  characteristic  transfer 
times  increase  at  least  as  fast  as  the  square  of  the  gate  length,  L.  Minimizing  this  parameter 
is  thus  crucial  to  designing  a  processor  with  both  adequate  speed  and  transfer  efficiency. 


11.5  Power  Dissipation 

11.5.1  On-chip  dissipation 

Because  signal  charge  moves  through  the  CCD  array,  there  is  an  effective  current  flowing 
across  a  resistive  medium  causing  power  to  be  dissipated  on  chip.  The  dissipation  per  gate 
is  given  by  [93], 

Pgate  =  -  (J  ’  E) 

P 

=  -Q(v-E) 

=  (11.37) 

P'n 

where  J  is  the  current  density,  E  is  the  lateral  electric  field,  Q  is  the  charge  under  the  gate, 
V  is  the  average  charge  velocity,  is  the  clock  frequency,  L  is  the  gate  length  and  jdn  is  the 
carrier  mobility. 
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To  obtain  a  rough  estimate  of  the  on-chip  power  dissipation,  we  can  set  =  1500cm^/ 
V-sec,  /c  =  5MHz,  L  =  10/tm,  and  Q  =  -6.4  x  10~^'^  coulombs,  which  is  the  charge  of 
400000  electrons,  giving 

Ppaie=10-^W  (11.38) 

Even  if  the  array  contained  10®  gates,  the  total  on-chip  dissipation  due  to  charge  transfer 
would  be  no  more  than  ImW.  We  can  thus  for  all  practical  purposes  ignore  this  contribution 
to  the  total  power  required  to  operate  the  processor. 

11.5.2  Power  dissipation  in  the  clock  drivers 

The  primary  source  of  power  dissipation  in  operating  a  CCD  array  is  in  the  clock  drivers 
which  must  supply  the  current  to  charge  and  discharge  the  large  capacitive  loads  from  all 
of  the  gates  tied  to  a  given  clock  phase  at  each  cycle.  For  a  square-wave  signal  the  energy 
dissipated  in  the  internal  resistance  of  the  clock  driver  when  either  charging  or  discharging 
a  capacitance  C  is  given  by  [93], 

Eciock  =  \CV^  (11.39) 

where  V  is  the  voltage  swing  on  the  capacitor.  Since  each  gate  is  charged  and  discharged 
once  per  transfer  cycle,  the  power  dissipated  per  gate  is 

Pgate  =  CV^ft  (11-40) 

with  ft  being  the  transfer  cycle  frequency. 

A  lOO/tmr^  gate  with  nominal  capacitance  of  0.5fF /pm^  operating  with  a  5V  swing  and 
IMHz  transfer  cycle  frequency  thus  requires 

=  1.25X  10-®W  (11.41) 

to  be  dissipated  in  the  driver  circuit.  Operatiirg  a  256x256  four-phase  array  would  therefore 
consume  at  least  82m W. 

Power  dissipation  in  the  supporting  circuitry  can  be  reduced  by  using  tuned  sinusoidal 
drivers  [93].  In  these  circuits,  the  total  power  required  to  drive  each  gate  is 


Pgate  =  ^CV^  ft 

Qj 


(11.42) 
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where  Qf  is  the  quality  factor  of  the  driving  oscillator.  With  well-tuned  circuits,  power 
dissipation  in  the  clock  drivers  can  be  reduced  significantly. 

11.6  Charge  Input  and  Output 

In  order  to  interface  other  circuits  to  a  CCD  processor,  signal  charge  must  be  introduced 
into  the  array  at  some  point  and  the  stored  charge  must  later  be  converted  into  a  usable 
output  signal.  This  chapter  is  thus  concluded  by  discussing  the  specific  I/O  structures  used 
in  the  MSV  processor. 

11.6.1  Fill-and-Spill  input  method 

Input  charge  may  be  generated  either  electrically  or  optically.  Optical  input  is  of  course 
necessary  for  image  sensors,  however,  in  a  test  circuit  it  is  difficult  to  control.  In  designing 
the  prototype  MSV  processor,  an  electrical  technique  known  as  the  fill- and- spill  method 
was  used  to  generate  the  repeatable  input  signals  required  for  testing  the  chip. 

The  fill-and-spill  method  exploits  the  relation  (11.28)  between  maximum  signal  charge 
and  the  difference  in  neighboring  gate  voltages.  The  process  is  illustrated  in  Figure  11-8. 
Voltages  Vref  and  are  applied  to  two  adjacent  gates  creating  the  relative  potential  profile 
shown,  while  the  potential  barrier  to  the  right  of  the  input  gate  is  created  by  holding  the 
stop  gate  (SG)  at  a  lower  voltage  than  Vref- 

Signal  charge  is  supplied  via  the  ohmic  contact  made  to  the  n+  diffusion  adjacent  to 
the  reference  gate.  In  the  fill  stage,  the  diffusion  potential,  controlled  by  Vd,  is  lowered 
causing  electrons  to  flood  the  higher  potential  regions  beneath  both  the  reference  and  input 
gates.  On  the  next  clock  cycle,  Vj.  is  raised  so  that  the  diffusion  potential  is  well  above 
the  reference  channel  potential,  causing  excess  electrons  to  spill  back  into  the  diffusion  and 
leave  behind  a  charge  packet  Q  of  size,  theoretically,  given  by 

Q  =  {Vin-Vref)Cir,  (11.43) 

where  Cin  is  the  total  capacitance  of  the  input  gate.  Following  the  spill  operation,  the  signal 
charge  can  be  moved  into  the  array  by  appropriately  clocking  the  stop  gate  and  transfer 
gate  voltages. 

Several  variations  on  the  basic  structure  shown  in  Figure  11-8  have  been  used  with  CCDs. 


CHAPTER  11.  CHARGE  COUPLED  DEVICE  FUNDAMENTALS 


174 


Since  it  is  often  useful  to  maintain  a  linear  relation  between  the  charge  packet  size  and  the 
voltage  difference  {Vin  -  Ke/),  fill-and-spill  structures  are  sometimes  built  from  surface 
channel  devices  [86].  To  avoid  threshold  mismatches,  the  reference  and  input  gates  are 
usually  designed  in  the  same  level  of  polysilicon.  This  necessitates  placing  either  a  floating 
diffusion  or  a  dummy  gate  in  second-level  poly  [65]  to  maintain  the  lateral  continuity  of  the 
channel.  In  the  MSV  processor,  threshold  mismatch  in  the  input  stage  was  not  a  concern  as 
the  input  test  signals  were  completely  controllable  and  any  effects  due  to  mismatch  could 
be  removed  by  adjusting  the  input  level.  In  the  present  design,  it  was  thus  simpler  to  use 
different  polysilicon  levels  for  the  input  and  reference  gates,  as  shown  in  the  diagram. 

The  primary  difficulty  in  designing  a  fill-and-spill  device  to  produce  consistent  output 
levels  for  given  values  of  and  Ke/  is  that  the  charge  packet  size  is  in  fact  a  random 

variable.  First,  the  amount  of  charge  stored  in  the  capacitor  under  the  input  gate  will 
fluctuate  due  to  thermal  noise,  with  the  mean  value  of  the  fluctuations  given  by  [96] 

AQ^ykfC~  (11.44) 


where  k  is  Boltzmann’s  constant,  and  T  is  the  absolute  temperature.  The  ratio  of  noise  to 
the  total  signal  is  thus  _ 


Vin  Vrt 


(11.45) 


A  second  problem  causing  the  signal  charge  to  fluctuate  is  thermionic  emission.  This  is 
the  problem  discovered  by  Keast  [86]  which  limited  the  resolution  of  the  CCD-based  absolute 
value  of  difference  circuit.  This  circuit,  which  is  essentially  composed  of  two  cross-coupled 
fill-and-spill  structures,  has  a  dead  zone  for  small  input  differences  due  to  the  fact  that 
small  potential  barriers  are  not  very  effective  in  holding  back  the  signal  charge.  Thermionic 
emission  is  caused  by  the  finite  kinetic  energy  of  the  electrons.  Just  as  water  will  boil  over 
the  side  of  a  pot  filled  to  the  rim,  the  kinetic  energy  of  electrons  at  the  ‘top’  of  a  potential 
barrier  can  give  them  enough  boost  to  jump  over  the  side. 

In  the  fill-and-spill  structure,  the  barrier  under  the  reference  gate  is  used  to  block  elec¬ 
trons  from  spilling  back  into  the  high  potential  diffusion.  Because  of  the  thermionic  effect, 
however,  fewer  electrons  than  predicted  by  equation  (11.43)  will  remain  under  the  input 


gate.  Given  sufficient  time,  the  charge  level  will  drop  until  the  potential  difference  under 
the  input  and  reference  gates  is  large  enough  to  overcome  the  average  kinetic  energy  of 
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the  electrons.  If  the  difference  between  Vin  and  Vref  is  small,  however,  this  may  not  occur 
before  all  of  the  signal  charge  is  lost. 

One  solution  to  limiting  the  effect  of  thermionic  emission  is  to  run  the  input  process 
fast  enough  so  that  most  of  the  energetic  electrons  do  not  have  time  to  jump  the  barrier. 
This  idea  was  used  by  Keast  to  improve  the  resolution  of  the  absolute  value  of  difference 
circuit.  It  is  not  a  good  idea,  however,  for  operating  the  input  circuit  if  stable  signal  levels 
are  desired,  since  the  amount  of  charge  lost  per  unit  time  is  not  a  well-controlled  quantity. 
For  best  stability,  the  fill-and-spill  structure  should  be  operated  slowly  so  that  the  amount 
of  charge  will  decrease  to  the  level,  which  is  a  function  only  of  the  temperature  T,  where 
the  emission  current  is  negligible. 


11.6.2  Floating  gate  amplifier  output  structures 


Sensing  the  amount  of  charge  in  a  given  packet  is  usually  performed  by  injecting  it  onto 
the  ‘floating’  plate  of  a  precharged  capacitor  and  measuring  the  resulting  change  in  voltage. 
Typically,  the  sensing  capacitor  is  either  a  gate  or  a  diffusion  connected  to  a  high  input 
impedance  buffer.  In  the  present  design,  the  floating  gate  technique  was  used  exclusively 
because  it  allows  for  non-destructive  sensing,  which  is  necessary  since  measurements  must 
be  made  at  several  different  stages.  In  addition,  the  floating  gate  structure  allows  greater 
sensitivity,  lower  noise,  and  better  matching  than  the  floating  diffusion  structure  [97]. 

A  diagram  of  a  floating  gate  structure  placed  in  a  CCD  array  is  shown  in  Figure  ll-9a. 
A  reset  transistor  initializes  the  gate  to  a  voltage  Vi  and  is  then  turned  off,  disconnecting 
the  gate  from  the  clock  signal.  Charge  is  injected  into  the  potential  well  under  the  floating 
gate  by  clocking  the  other  gates  as  described  in  Section  11.3.  The  change  in  voltage, 
AVg  =  Vf  —  Vi,  is  buffered  by  a  source  follower  which  provides  a  high  impedance  connection 
to  the  gate  and  sets  the  load  capacitance  to  a  fixed  value. 

The  relation  between  signal  charge,  Qsig  and  AVg,  is  best  understood  using  the  lumped 
capacitance  model  of  Figure  ll-9b.  Cox,  Cd\,  and  Cd2  are  as  defined  in  Section  11.2  while 
Cioad  represents  the  combined  normalized  capacitance  from  the  source  follower  input  gate, 
the  drain  diffusion  of  the  reset  transistor,  the  overlap  of  the  adjacent  gates,  and  parasitic 


sidewall  capacitances. 


Normalization  by  the  gate  area,  Ag,  is  necessary  to  maintain  consisteircy  of  units. 


(11.46) 


Potential 
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a. )  Floating  gate  structure  placed  in  linear  array. 


Figure  11-9:  Charge  sensing  using  the  floating  gate  technique 
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The  channel  charge  is  divided  between  the  capacitors  Ceff  and  Cd2,  whose  common 
‘plate’  is  at  the  potential  maximum  <j>max-  If  charge  Qsig  is  injected  into  the  channel,  it  will 
also  divide  between  the  two  capacitors  such  that 

Qs^g^QefJ  +  Qd2  (11-47) 

where 

Qeff  =  Ceff  (A.(l>max  “  g)  (11.48) 

and 

Qd2  =  Cd2^4>max  (11-49) 

The  quantity  A4>max  represents  the  change  in  the  maximum  channel  potential,  while  {A(f)max— 
AVg)  is  the  change  in  voltage  across  Ceff  as  seen  from  the  ‘bottom  plate’. 

The  charge  Qeff  on  the  bottom  plate  of  Ceff  must  be  mirrored  by  an  equal  charge  of 
opposite  polarity,  —Qeff,  on  the  top  plate.  Since  the  total  charge  shared  between  the  top 
plate  of  Ceff  and  Cioad  is  constant,  the  charge  on  Cioad  must  increase  by  VQeff 

AQload  =  +<5e//  (11.50) 


In  designing  a  floating  gate  amplifier,  we  normally  want  as  much  voltage  swing  at  the 
output  as  possible.  The  final  voltage  on  the  gate,  F/,  is  limited  however  by  the  charge 
capacity  equation  (11.28).  If  we  let  Vi  represent  the  low  clock  voltage  which  is  applied  to 
the  gates  neighboring  the  sense  node  and  let  Qmax  —  —(lNe,max  represent  the  maximum 
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signal,  then  from  (11.28)  we  must  ha.ve 


Vf-Vi  > 


(11.54) 


To  simplify  notation,  let  a  denote  the  quantity  in  parentheses  in  equation  (11.53): 

1/Cd2 

1/Cd2  +  1/^e//  +  IfCload 
such  that  the  minimum  final  gate  voltage,  is  given  by 


Vf^min  —  «  T  ^ 


Qmax 


(11.55) 


(11.56) 


We  can  then  eliminate  Qmax  fi'om  equations  (11.54)  and  (11.56)  to  obtain 


h  f,min  — 


CloadVi  +  (xCejjWl 
Cload  4"  ^eff 


(11.57) 


For  given  values  of  Vi  and  Tq,  we  can  design  the  floating  gate  for  a  desired  by 

adjusting  Cioad-  The  smaller  Cioad  is  with  respect  to  aCeff,  the  closer  Vf^min  will  be  to 
Vi-  We  do  have  to  be  careful,  however,  to  not  make  Chad  too  small  as  its  value  also  affects 
the  maximum  charge  size,  and  therefore  the  signal  to  noise  ratio.  Eliminating  Vj  from 
equations  (11.54)  and  (11.56),  we  obtain  the  following  expression  for  Ne,max 


1  /  Vi -Vl 
q  \  O’ f  Cload  T  l/OeJJ 


(11.58) 


As  Cload  0  SO  does  Ne,max- 
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CCD  Edge  Detector  Design 


The  floor  plan  of  the  complete  MSV  edge  detection  processor  is  shown  in  Figure  12-1. 
There  are  four  basic  operations  performed  by  the  processor: 

1.  Charge  input, 

2.  Smoothing, 

3.  Computing  the  magnitude  of  the  difference  between  values  at  neighboring  nodes,  and 

4.  Storing  and  reading  out  the  binary  edge  signals. 

A  fill- and- spill  structure  placed  at  the  top  of  the  block  marked  ‘Charge  Input  and  Shift 
Register’  is  used  to  load  each  pixel  of  the  image,  converting  the  brightness  values  to  signal 
charge.  An  image  is  loaded  into  the  array  one  column  at  a  time,  using  the  vertical  shift 
register  to  move  the  pixels  to  their  appropriate  rows.  Once  the  last  pixel  of  the  column  has 
been  read  in,  the  contents  of  the  entire  shift  register  are  transferred  horizontally  into  the 
processing  array. 

Both  the  smoothing  and  differencing  operations  are  performed  within  the  array.  Edge 
charges  are  stored  separately  at  each  cell  for  every  horizontal  and  vertical  node  pair.  They 
are  not  coalesced  on-chip  into  one  edge  signal  per  node,  as  described  at  the  end  of  Sec¬ 
tion  5.3,  in  order  to  leave  more  flexibility  in  the  design  of  external  circuits  which  interface 
to  the  processor.  The  horizontal  and  vertical  edge  signals  for  one  row,  which  is  selected 
by  the  decoder  block  to  the  right  of  the  array,  can  be  output  in  parallel  at  the  end  of  any 
smoothing/differencing  cycle  without  disrupting  the  proper  operation  of  the  algorithm. 
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Figure  12-1:  Floor  plan  of  the  complete  MSV  processor. 


In  the  following  sections  I  will  describe  the  design,  operation,  and  layout  of  the  circuits 
used  in  the  prototype  MSV  processor  fabricated  through  MOSIS.  I  will  first  discuss  the 
architecture  of  the  unit  cells  which  compose  the  array  and  then  present  the  detailed  design 
of  the  CCD-  and  transistor-based  processing  elements.  Test  results  from  the  fabricated 
circuits  are  presented  in  the  next  chapter. 

12.1  CCD  Processing  Array 

A  block  diagram  of  the  unit  cell  is  shown  in  Figure  12-2,  with  the  corresponding  layout, 
measuring  224A  X  224A^,  shown  in  Figure  12-3.  At  the  boundary  of  the  cell  are  the  CCD 
gates  in  alternating  levels  of  polysilicon  which  are  sized  so  that  when  cells  are  abutted  to 
form  the  processing  array,  the  gate  structure  seen  in  Figure  5-3  results.  Signal  charges 
proportional  to  the  pixel  brightness  values  are  stored  under  the  large  gates  at  the  corners  of 
the  cell.  One  floating  gate  amplifier  (FGA)  per  cell  senses  the  charge  under  the  gate  in  the 


^In  the  Orbit  CCD  process  X  =  1/itm. 
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Figure  12-2:  Unit  cell  architecture. 
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Figure  12-3:  Unit  cell  layout 
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Figure  12-4:  Floating  gate  design. 


lower  leftliand  corner  and  feeds  the  output  voltage  to  the  four  differential  amplifiers  which 
are  paired  with  its  nearest  neighbors.  For  reasons  dictated  by  the  layout,  it  was  simplest  to 
have  the  floating  gate  amplifier  communicate  with  the  differencing  circuit  directly  adjacent 
to  it  and  to  those  in  the  neighboring  cells  to  the  west  and  southwest.  The  edge  signals 
stored  in  the  blocks  marked  ‘Horizontal’  and  ‘Vertical’  thus  correspond  to  edges  between 
the  two  nodes  at  the  base  of  the  cell  and  the  two  on  the  righthand  vertical  side,  respectively. 

12.1.1  Charge  sensing 

The  more  detailed  picture  of  the  floating  gate  amplifier  used  for  charge  sensing  at  the 
signal  nodes  is  shown  in  Figure  12-4.  The  clock  phase  (f>i  is  gated  through  the  p-type  reset 
transistor  controlled  by  the  signal  Vfg.  When  Vfg  is  brought  low,  the  node  gate  voltage 
is  controlled  by  for  the  purposes  of  charge  transfer  and  storage.  When  V/g  is  brought 
high,  however,  the  gate  is  left  floating  and  can  thus  be  used  for  measuring  charge  levels  as 
described  in  Section  11.6.2. 

In  order  to  sense  the  signal  level,  the  node  must  be  initially  emptied  by  transferring 
the  charge  out  through  the  four  connecting  branches.  Once  this  is  done  and  <f>i  is  brought 
high,  Vfg  is  also  brought  high,  turning  off  the  reset  transistor  and  initializing  the  node  gate 


CHAPTER  12.  CCD  EDGE  DETECTOR  DESIGN 


184 


Parameter 

Value 

Units 

Nd 

3.8  X  10^^" 

cm~^ 

Na 

5.05  X  10^5 

cm~^ 

^OX 

420 

A 

F. 

0.6 

V 

Xn 

0.4 

pm 

Cox 

8.10  X  10-1^ 

F  !  jim^ 

Cpi-P2 

5.0  X  10-^^^ 

F  !  unV 

CjswO 

3.46  X  lO-^*^ 

F /pm 

<Pbi 

0.8 

V 

Table  12.1:  Orbit  CCD/CMOS  process  parameters  (from  10-28-93  run). 


voltage  to  the  value  Vi.  The  signal  charge  is  then  returned  and  dumped  back  into  the  empty 
potential  well,  causing  the  floating  gate  voltage  to  change  to  the  value  Vj,  where 


Qsig  ( _ 1/^2 _ 

Cload  \llCd2  +  IlCeff  +  l/Cload 


(12.1) 


The  minimum  value  which  Vj  can  attain  is,  from  equation  (11.57), 


F, 


f,min 


CloadVi  +  0'Fe//V; 
Cl  oa  d  +  aCeff 


(12.2) 


where  Vt  is  the  low  clock  voltage  and 


I/Q2 

llCd2  +  I/Cejj  +  Cload 


(12.3) 


In  the  test  system  designed  to  evaluate  the  prototype  processor,  the  clock  drivers  were 
operated  between  voltages  14  =  4.5V  and  F/  =  0.6V.  Targetting  a  full  scale  swing  of  2V, 
i.e.,  Vf^min  =  2.5V,  we  can  compute  the  nonlinear  depletion  capacitances  Cdi  and  Cd2  with 
Ve  =  Ne,max  from  the  equations  developed  in  Section  11.2  using  the  values  of  the  Orbit 
process  parameters  given  in  Table  12.1.  From  equation  (12.2)  we  then  obtain  the  required 
load  capacitance,  Cioad-  The  results  are  given  below  in  Table  12.2. 

The  voltage  change  on  the  floa.ting  gate  is  measured  via  a  source-follower  buffer  whose 
design  was  determined  by  two  primary  considerations.  The  first  was  the  need  to  minimize 
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Parameter 

Value 

Units 

N 

'  e^max 

3272 

electrons  / gnP 

4^max 

4.20 

V 

^max 

0.27 

jim 

Cdi 

4.6  X  10-1“ 

F  / 

Cd2 

9.1  X  10-11 

F  / 

Ceff 

2.9  X  10-1“ 

F /gw? 

Cload 

1.4  X  10-1“ 

F  /  gml^ 

Table  12.2;  Floating  gate  parameter  values  for  Vg  =  2.5V  and  Ng  =  Ne,max- 


the  loading  capacitance  on  the  floating  gate,  while  the  second,  and  most  critical,  was  the 
need  to  have  the  output  voltage  in  the  correct  range  for  interfacing  directly  to  the  differential 
amplifiers.  For  the  latter  reason,  an  ?i-type  source  follower  was  used,  despite  the  fact  that 
a  higher  gain  could  be  achieved  with  a  p-type  design  with  separate  wells  connected  to  the 
sources  of  the  bias  and  input  transistors. 

Within  these  considerations,  it  was  of  course  desirable  for  the  gain  to  be  as  high  as  pos¬ 
sible.  The  theoretical  small  signal  gain  of  the  n-type  configuration,  as  shown  in  Figure  12-4, 
is  given  by  [98] 


'^o  _  9m 

'^i  9m  "f*  9mb  H"  l/-^e// 


(12.4) 


where  gm  and  g^b  are  the  small  signal  gate-source  and  source-bulk  transconductances, 
respectively,  of  the  input  transistor  and  Reff  is  the  parallel  combination  of  the  input  and 
bias  transistor  output  resistances.  Since  little  can  be  done  to  reduce  gmb,  maximizing  the 
gain  involves  making  Rgfj  as  large  as  possible  compared  to  1/pm- 

To  increase  the  drain  resistances,  both  transistors  were  drawn  with  long  channels  to 
reduce  channel  length  modulation,  and  the  W/L  ratio  of  the  bias  transistor  was  dimensioned 
to  make  the  drain  current.  In,  small.  Since  the  output  resistance  is  proportional  to  1/In 
while  gm  increases  as  y/ln  [98],  the  product  gmReff  increases  when  the  drain  current  is 
lowered  as  1  /^/Tn.  The  W /L  ratio  of  the  input  transistor,  on  the  other  hand,  was  determined 
by  the  output  voltage  range  needed  by  the  differential  amplifiers.  The  sizes  used  in  the  final 
design  were  W/i  =  6A/4Aforthe  input  transistor  and  W/T  =  4A/8Aforthe  bias  transistor. 

Figure  12-5  shows  the  simulated  behavior  of  the  source  follower  with  a  bias  voltage 
of  Vhias  =  IV  based  on  the  Orbit  process  parameters  from  the  10-28-93  run.  From  the 
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Floating  gate  source  follower  characteristic  (design  simulation) 


Figure  12-5:  Simulated  source  follower  characteristic  for  floating  gate  amplifier  (14ms  = 
IV). 


simulations,  Id  is  computed  as  63nA,  giving  a  static  power  dissipation  of  313nW  for  Vdd  = 
5V.  The  predicted  DC  gain  is  Vo/vi  =  0.800,  with  a  3dB  bandwidth  of  800kHz  for  a  600fF 
load  approximating  that  of  the  differential  amplifier  input  gates.  For  the  maximum  and 
minimum  inputs  of  4.5V  and  2.5V,  the  output  voltages  are  2.85V  and  1.22V,  respectively, 
giving  a  predicted  full  scale  swing  of  1.63V. 

The  normalized  load  capacitance,  Cioad,  on  the  floating  gate  is  computed  from  equa¬ 
tion  (11.46)  by  summing  the  contributions  from  all  loads  and  dividing  by  the  total  gate 
area.  The  dimensions  of  the  node  gates  were  W/im  X  SOf-im,  with  40p???.  X  2pm  of  over¬ 
lap  area  (2pm  being  the  minimum  polyl-poly2  overlap  width  allowed  in  the  Orbit  design 
rules)  and  80pm  of  sidewall  perimeter.  The  capacitances  for  each  element  loading  the  gate 
are  computed  and  given  in  Table  12.3,  below.  The  sidewall  capacitance,  Csw^  which  is  a 
function  of  the  voltage  between  the  buried  channel  and  the  substrate,  was  computed  at  the 
minimum  channel  potential  of  4.2V,  for  which  the  unit  capacitance  Cjsw  =  -14  fF/pm.  The 
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total  capacitance  from  all  sources  is  thus  estimated  at  82.6fF,  giving 


^1  _ 

^'load  — 


82.6/F 


30/im  X  SOf-im 


9.2  X 


(12.5) 


which  is  comfortably  below  the  value  of  1.4  X  10  F / required  for  a  2V  swing. 


Csf 

input  gate 

16.5  fF 

gate-source 

13.2  fF 

total 

29.7  fF 

C drain 

reset  transistor 

1.6  fF 

Covl 

80pm'^  X  Cpi_p2 

40  fF 

C  svj 

80p  X  Cjswi'lAV) 

11.3  fF 

Total 

82.6  fF 

Table  12.3:  Capacitances  loading  the  floating  node  gate. 


The  number  of  electrons  corresponding  to  the  maximum  signal  is  found  by  multiplying 
Ne,max  for  the  minimum  final  voltage  of  2.5V  by  the  gate  area,  Ag.  From  Table  12.2,  we 
have  Ne,max  =  3493  electrons/pm^,  giving  a  total  of  approximately  3.1  x  10®  electrons. 

12.1.2  Clock  sequences 
Charge  transfer  and  smoothing 

Six  different  clock  phases  were  required  to  operate  the  MSV  gate  array.  Only  four 
are  needed  to  move  charge  laterally  across  the  array  when  loading  or  unloading  an  image. 
However,  for  the  smoothing  operation  the  motion  is  along  all  four  branches  connected  to 
each  node  and  two  more  clock  phases  are  required  to  control  the  direction  of  charge  flow. 

For  simple  lateral  transfer  a  pseudo  four-phase  clocking  scheme  was  used.  Charge  move¬ 
ment  in  this  scheme  is  identical  to  that  described  in  Section  11.3,  however,  the  clocking 
sequence  is  more  complicated  because  the  phases  are  not  arranged  in  a  simple  1-2-3-4  re¬ 
peating  pattern.  The  clock  signals  connected  to  each  gate  are  shown  in  Figure  12-2.  When 
the  charge  is  held  under  the  node  gate,  the  signal  (f>i  is  high  and  Vjg  is  low,  as  are  the 
signals  <1)2,  05,  and  06  connected  to  the  gates  neighboring  the  node.  To  move  charges  from 
the  nodes  on  the  left  side  of  the  unit  cell  to  those  on  the  right,  Vjg  and  06  are  held  low  and 
the  following  sequence  is  executed:  (02 T),  (0il,  03 1),  (02  i,  04?),  (03  i,  0iT),  (04  i,  02  1), 
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<1>1  (t>2  (|)3  (1)5 


^2  ^3  ^5  ^1 


^1  ^2  ^3  ^5  ^1 


^1  ^2  h  *^^5  *1^1 


Figure  12-6:  Charge  averaging  operation. 
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{hG  (pst).  {hi,  h  1),  hi),  {hi,  ^sT),  {hi,  hi),  (<^3J,  (<;5>5i)- 

The  up  arrow,  is  used  to  imply  that  the  corresponding  signal  is  brought  high,  while  the 
down  arrow,  |,  indicates  that  it  is  brought  low. 

The  2-D  smoothing  operation  is  performed  in  two  passes  by  sequentially  executing  a 
1-D  smoothing  operation  along  the  horizontal  and  vertical  branches.  Each  1-D  operation 
consists  of  four  steps:  (1)  splitting  the  charge  held  under  the  node  gates  into  two  equal 
packets,  (2)  moving  the  packets  out  the  branches  connected  to  the  node  towards  the  mixing 
gate,  (3)  averaging  the  packets  from  adjacent  nodes,  and  (4)  returning  the  averaged  packets 
back  to  the  node  gate  where  they  are  added  together.  Splitting  is  performed  when  the 
charge  is  entirely  confined  under  the  node  gate  {Vfg  low  and  (j)^  high)  by  first  raising  the 
signals  on  the  adjacent  gates  (h  and  <f>s  for  horizontal  smoothing;  h  for  vertical  smoothing) 
which  causes  the  charge  to  distribute  itself  evenly  over  the  high  potential  regions  under  the 
three  gates.  Bringing  h  low  then  divides  the  charge  into  two  equal  and  isolated  packets. 

For  the  horizontal  smoothing  operation,  4>3,  <^4,  and  h  held  low  during  the  splitting 
phase.  Executing  the  sequence  (^3  T),  {h  I,  h  i,  h  1 ),  {h  U  h  t),  {h  h  i,  h  t  then 
moves  the  charge  packets  away  from  the  node  gate  along  the  horizontal  branches  and  creates 
the  situation  shown  at  the  top  of  Figure  12-6,  where  the  packets  from  two  adjacent  nodes 
are  about  to  collide.  The  central  gate  controlled  by  4)3  on  each  of  the  branches  is  referred  to 
as  the  mixing  gate.  The  two  unequal  packets  from  the  neighboring  nodes,  initially  separated 
by  the  barrier  at  the  mixing  gate  are  added  together  when  h  is  raised  and  h  is  brought 
low.  When  h  is  subsequently  lowered  and  h  is  brought  high,  the  summed  charge  is  then 
divided  in  two.  The  averaged  packets  are  then  returned  to  their  respective  nodes,  where 
the  results  from  the  opposing  branches  are  combined,  by  executing  the  inverse  sequence  of 
that  used  to  move  them  away. 

It  can  easily  be  shown  that  the  horizontal  operation  is  equivalent  to  convolving  the 
stored  image  with  the  discrete  kernel 


1 

4 


1  2  1 


(12.6) 


while  the  vertical  operation,  which  is  performed  in  a  similar  manner  by  appropriately  sub¬ 
stituting  4>e  for  the  signals  h  and  h  in  the  clocking  sequence,  is  equivalent  to  a  convolution 
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1 
2 
1 

The  complete  smoothing  cycle,  which  consists  of  performing  first  the  horizontal  and  then 
the  vertical  operations,  is  thus  equivalent  to  a  convolution  with  the  2-D  binomial  kernel 

1  2  1 

2  4  2  (12.8) 

1  2  1 

Each  successive  smoothing  cycle  repeats  this  operation  so  that  performing  n  cycles 
results  in  the  convolution  of  the  original  image  with  the  (?r  — l)th  convolution  of  this  kernel 
with  itself.  For  instance,  two  cycles  corresponds  to  a.  convolution  with 

1  4  6  4  1' 

4  16  24  16  4 

6  24  36  24  6  (12.9) 

4  16  24  16  4 

1  4  6  4  1  _ 

The  size  of  the  smoothing  filter,  which  is  controlled  by  the  number  of  cycles  performed,  can 
thus,  theoretically,  be  made  arbitrarily  large. 

Avoiding  backspill 

One  problem  which  arises  from  using  two  polysilicon  levels  in  the  gate  array  is  that  the 
channel  potentials  are  not  the  same  for  equal  gate  voltages  applied  to  both  levels.  For  the 
Orbit  process,  the  difference  in  the  polyl  and  poly2  channel  potentials  is  approximately 
0.4V,  with  poly2  being  higher.  The  consequence  of  the  potential  mismatch  is  that  it  can 
result  in  incomplete  transfer  due  to  backspill.  A  more  accurate  depiction  of  the  potential 
levels  along  the  array  during  charge  transfer  is  illustrated  in  Figure  12-7.  When  a  polyl 
gate  is  brought  low,  a  small  amount  of  charge  can  be  transferred  backwards  by  being  pulled 
into  the  slightly  higher  potential  region  under  the  adjacent  poly2  gate  on  the  other  side. 
To  avoid  this  problem,  while  keeping  the  clock  driver  circuits  simple,  a  special  level  shift 


(12.7) 
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^2  ^3  ^4  ^1  ^2 


Figure  12-7:  Backspill  caused  by  potential  mismatch. 


to  gates 


Figure  12-8:  Level  shift  circuit  for  correcting  potential  mismatch  problem. 
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circuit,  shown  in  Figure  12-8,  was  added  on  chip  to  the  poly2  clock  lines.  When  the  clock 
phase  is  low  and  the  signal  is  brought  high,  the  clock  line  is  pulled  all  the  way  to  ground. 
Since  the  low  clock  voltage  used  in  the  test  system  was  Vi  =  0.6V,  this  causes  the  channel 
potential  under  the  poly2  gate  to  be  slightly  higher  than  that  under  the  polyl  gate  which 
is  being  brought  low,  and  thus  prevents  backspiU.  In  the  following  clock  cycle  when  the 
poly2  gate  is  brought  low,  (po  is  held  low  so  that  the  potential  under  the  poly2  gate  at  Vi 
will  be  slightly  higher  than  that  of  the  polyl  gate  behind  it.  As  a  result,  there  is  always 
some  small  barrier  preventing  charge  from  travelling  in  the  wrong  direction. 

12.1.3  Boundary  processing  and  charge  input 

The  cells  on  the  array  boundary  are  different  from  those  of  the  interior,  primarily  because 
they  are  missing  one  or  more  neighbors.  In  addition,  the  cells  along  the  west  boundary  of 
the  array  contain  the  charge  input  structure  and  the  vertical  shift  register  for  moving  the 
input  signals  to  the  appropriate  row.  Only  the  north  and  west  boundary  cells,  shown  in 
Figures  12-9  and  12-11,  contain  differencing  and  threshold  circuits — the  north  cell  for  the 
horizontal  pair  at  the  base  of  the  cell,  and  the  west  cell  for  the  vertical  pair  on  the  right 
side  of  the  cell. 

Smoothing  on  the  boundary  cells  is  inhibited  in  the  direction  away  from  the  array  by 
keeping  the  gates  normally  used  for  mixing  at  a  low'  voltage.  For  the  north  and  south  cells, 
these  gates  are  simply  grounded,  while  on  the  east  and  west  cells,  they  are  connected  to  a 
clock  signal,  (psG  (for  stop  gate),  which  is  held  low  during  the  smoothing  operations.  An 
independent  signal  is  necessary  for  the  stop  gates  in  the  east  and  west  boundary  cells  since 
they  must  be  clocked  to  allow  charge  to  be  moved  into  and  out  of  the  array. 

When  loading  and  unloading  ima.ges,  (f>sG  is  driven  identically  to  ^3  in  the  lateral  transfer 
sequence.  An  image  is  loaded  column  by  column,  shifting  the  entire  contents  of  the  array 
one  node  to  the  east  as  the  next  column  is  entered.  Any  charge  previously  held  in  the  array 
is  also  moved,  and  in  particular,  the  charge  held  under  the  node  gate  of  the  east  boundary 
cell,  shown  in  Figure  12-10,  is  dumped  out  into  the  high  potential  n-f  diffusion  next  to 
the  stop  gate.  In  order  to  have  the  option  of  measuring  the  signal  levels  of  the  smoothed 
image  as  it  is  removed  from  the  array,  the  floating  gate  amplifier  outputs  on  the  east  cells 
are  connected  to  an  output  pad  driver  and  can  thus  be  sensed  off-chip.  To  use  this  option, 
the  horizontal  transfer  sequence  must  be  modified  to  include  turning  off  the  reset  transistor 
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Figure  12-9:  Boundary  cell  design  (north). 


♦2  *3  *4  *1  ^2  ♦sG 


♦2  *3  ♦4  ♦l  *2  *50 


Figure  12-10:  Boundary  cell  design  (east). 
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Figure  12-11:  Boundary  cell  design  (west). 
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and  allowing  the  node  gate  to  float. 

Charge  input  occurs  in  the  northwest  corner  of  the  array  using  the  fill-and-spill  structure 
diagrammed  in  Figure  12-12,  with  the  layout  given  in  Figure  12-13.  The  operation  of  this 
structure  is  exactly  as  described  in  Section  11.6.1,  except  for  the  names  of  the  clock  phases. 
Signal  charge  is  supplied  via  the  diffusion  at  the  top  of  the  structure.  The  reference 
voltage,  Vref,  is  connected  to  the  polyl  gate  next  to  the  diffusion,  while  the  input  voltage, 
Vin  is  connected  to  the  large,  Wfim  x  30/i?7i,  poly2  gate.  The  area  of  the  input  gate  is  the 
same  as  that  of  the  node  gates  within  the  array  to  ensure  that  enough  electrons  can  be 
supplied  to  achieve  the  full  range  of  the  floating  gate  amplifiers. 

The  vertical  shift  register  on  the  west  boundary  cells  is  clocked  using  a  standard  four- 
phase  method  with  the  signals  (j)si,  <l>s3^  and  <^^4,  and  is  connected  to  the  fill-and-spill 
structure  via  the  two  transfer  gates  clocked  by  TGi  and  TG2,  and  the  stop  gate  controlled 
by  SG.  After  the  fill-and-spill  operation,  the  input  charge  is  transferred  into  the  shift 
register  by  clocking  TG'i,  TG2,  and  (I)s4  high  and  raising  the  stop  gate  voltage,  SG,  to  an 
intermediate  level.  In  order  to  achieve  complete  transfer  without  spilling  charge  back  into 
the  input  gate,  two  conditions  must  be  sa.tisfied.  The  first  is  that  the  potentials  under  the 
input  gate,  stop  gate,  and  transfer  gates  must  be  cascaded  such  that 

^in  <  (l>SG  <  <pTGi  (12.10) 

while  the  second  condition  is  that  the  maximum  signal  charge  must  be  contained  entirely  in 
the  potential  well  under  the  two  transfer  gates  and  the  first  shift  register  gate  controlled  by 
(f)s4.  Since  the  combined  area  of  these  gates  is  Ibh/iw?,  the  electron  density  per  iJ.m?  of  gate 
area  for  a  maximum  signal  of  3.1  X 10®  electrons  is  Ng  =  4100  (electrons/ From  the  Or¬ 
bit  process  parameter  values  given  in  Table  12.1  and  the  equations  developed  in  Section  11.2, 
the  voltage  difference  between  the  transfer  and  stop  gates  must  be  approximately  2V.  With 
a  high  clock  level  of  Vh  =  4.5V,  the  maximum  value  of  Vsg  must  therefore  be  <  2.5V, 
while  the  voltage  difference  between  the  input  and  reference  gates,  corrected  for  the  0.4V 
polyl-poly2  potential  mismatch,  must  be  1.4V  to  input  the  maximum  number  of  electrons. 
Condition  (12.10)  is  thus  satisfied  for  all  inputs  with  Vref  =  0.7V<  V„  +  0.4V  <  2.5V. 

The  transfer  into  the  shift  register  is  completed  by  bringing  Vsg  back  to  its  low  level 
of  OV  and  then  clocking  TGi  J,  followed  by  TG2  </>si  T-  The  next  clock  sequence  to  be 
performed  depends  on  whether  or  not  the  input  value  is  the  last  pixel  of  its  column.  If  so. 
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Figure  12-14:  Multi-scale  veto  edge  detection  circuit  block  diagram. 


bringing  4>s4  low  positions  the  signal  charge  for  entry  into  the  top  row  of  the  array,  so  that 
executing  the  horizontal  transfer  sequence  (with  </>si  initially  clocked  as  (f)i)  will  move  the 
column  of  data  in  the  shift  register  to  the  first  column  of  node  gates.  Otherwise,  before  the 
next  pixel  can  be  loaded,  the  charges  in  the  shift  register  must  be  shifted  down  one  row  by 
executing  four  times  the  sequence:  (^^4  i,  4>s2  T)»  i4si  i,  <l>s3  T)?  {<l>s2  i,  <f>s4  T);  and  (^^3  J., 


12.2  Edge  Detection  Circuit 

In  detecting  edges  using  the  multi-scale  veto  rule,  differences  are  computed  between 
the  values  at  neighboring  pixels  after  each  smoothing  cycle,  and  an  edge  is  signalled  if 
and  only  if  the  magnitude  of  the  difference  is  above  the  threshold  specified  for  the  cycle 
in  each  of  the  tests.  The  circuit  implementation  of  the  differencing,  threshold,  and  veto 
operations  is  shown  in  block  diagram  form  in  Figure  12-14.  The  outputs  of  the  floating 
gate  amplifiers  from  two  neighboring  nodes  are  connected  to  the  inputs  of  a  double-ended 
differential  amplifier  with  gain  A.  The  two  outputs  of  the  differential  amplifier  are  equal 
to  Voc  +  AAV  and  Voc  -  AAV,  where  AV  =  Vi  -  V2  and  Voc  is  the  common-mode  output 
when  Ay  =  0.  Since  it  is  not  known  whether  AF  is  positive  or  negative,  the  threshold 
test  is  performed  by  comparing  both  outputs  to  a  voltage  representing  the  threshold  plus 
an  offset  to  compensate  for  Voc  as  well  as  any  systematic  bias  in  the  comparator.  If  the 
threshold  voltage  is  greater  than  both  +AAV  and  -  AAF,  the  edge  is  vetoed  by  grounding 
the  input  to  the  storage  latch. 

Since  space  is  limited  in  the  unit  cell,  it  is  not  practical  to  duplicate  the  comparator 
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Figure  12-15:  Clock  waveforms  for  driving  edge  detection  circuit. 


circuit  to  perform  both  tests  simultaneously.  Instead,  the  tests  are  performed  sequentially 
with  a  single  comparator  by  selectively  gating  the  differential  amplifier  outputs  using  the 
clock  signals  Ri  and  R2.  The  comparator  output,  which  is  high  if  the  threshold  voltage  is 
less  than  the  differential  output,  is  also  selectively  switched  to  one  of  the  inputs  of  the  NOR 
circuit  and  is  stored  on  the  input  gate  capacitance  when  the  switch  is  opened. 

The  clock  waveforms,  including  Ri  and  R2,  used  in  the  edge  detection  circuit  are  shown 
in  Figure  12-15.  Signals  R3  and  R4  are  used  in  the  comparator  circuit,  which  is  discussed 
in  Section  12.2.2.  The  signal  SET  is  used  to  initialize  the  edge  storage  latch,  discussed  in 
Section  12.2.3,  before  starting  the  series  of  smoothing  cycles  and  threshold  tests,  while  the 
signal  CG  is  used  to  gate  the  result  of  each  threshold  test,  once  it  is  valid,  to  the  storage 
latch  input. 

12.2.1  Double-ended  differential  amplifier 

The  role  of  the  differential  amplifier  is  to  generate  a  signal  proportional  to  the  difference 
in  the  input  voltages  which  can  be  compared  against  a  given  threshold  value.  In  order  to 
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Figure  12-16:  Differential  amplifier  circuit. 
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Figure  12-17:  Simulated  differential  amplifier  characteristic  for  common  mode  voltages 
v;  =  1.2V,  2.0V,  and  2.8V  (Vhus  =  IV). 
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meet  the  resolution  requirements  discussed  in  Chapter  10,  it  is  necessary  for  the  range  of 
measurable  differences  to  be  between  2.5%  and  10%  of  the  full  scale  input  range.  Since  the 
floating  gate  amplifiers  are  designed  for  a  full  scale  swing  of  1.6V,  the  range  of  distinguishable 
differences  must  be  between  at  least  40mV  and  IGOmV. 

If  the  differencing  circuit  is  to  be  effective  in  detecting  edges  over  the  full  range  of  input 
values,  two  more  requirements  must  be  satisfied.  First,  the  differential  amplifier  must  have 
a  very  high  common  mode  rejection  ratio  so  that  a.  given  difference  AV  corresponds  to 
approximately  the  same  output  signal  for  all  common  mode  input  levels.  Second,  the  gain, 
A,  of  the  amplifier  should  be  large  enough  to  magnify  the  minimum  input  difference  so  that 
it  is  greater  than  the  minimum  resolution  of  the  comparator  circuit.  On  the  other  hand, 
A  can  not  be  so  large  that  the  amplifier  output  saturates  when  the  input  difference  is  less 
than  the  maximum  value  which  must  be  measured.  These  output  constraints  translate  into 
requiring  the  amplifier  gain  to  be  between  ~  2  and  ~  15. 

The  circuit  diagram  of  the  differential  amplifiers  used  in  the  prototype  MSV  processor 
is  shown  in  Figure  12-16.  It  consists  of  two  identical  cascaded  differential  pairs  with  diode- 
connected  p-fet  loads.  The  common  mode  output  and  common  mode  rejection  ratios  of 
each  pair  are  determined  by  the  magnitude  of  the  bias  current  and  by  the  output  resistance 
of  the  bias  transistor.  The  small  signal  differential  gain,  of  each  pair  is  equal  to  the 
gain  of  the  half-circuit  composed  of  a  single  7r-fet  input  stage  with  a  p-fet  load  [98].  Let 


(12.11) 


be  the  transconductance  of  the  n-fet  input  and 


(12.12) 


be  that  of  the  p-fet  load,  where  Id  is  the  drain  current  through  both  transistors,  and  the 
factors  Kn  and  Kp  are  given  by  Kn  =  l^nCox  and  Kp  =  UpC'ox,  with  and  fip  being 
the  respective  electron  and  hole  mobilitities.  Since  the  output  voltage  is  equal  to  the  gate 
voltage  of  the  p-fet,  it  is  easily  seen  that 


'^od  _  Sml  _  /^n(14  f  L}i 

Vid  9m2  V  A^p(^V-^)2 


(12.13) 
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The  input  and  load  transistors  in  each  pair  were  sized  such  that  {W/L)i  =  10A/4A 
and  {  W/L)2  =  4A/8A.  With  nominal  values  for  the  Orbit  process  of  Kn  =  46.9//A/F^ 
and  Kp  =  17.0^A/V"^,  the  theoretical  differential  gain  is  thus  =  3.7.  The  advantage 
of  cascading  the  two  low-gain  differential  amplifiers  is  that  the  combined  differential  and 
common  mode  gains  are  the  products  of  the  individual  terms.  We  thus  obtain  both  a.  high 
differential  gain  and  a  high  common  mode  rejection  ratio  with  half  the  input  capacitance 
loading  the  floating  gate  amplifier  outputs. 

The  simulated  output  characteristics  of  the  differential  amplifier  using  the  Spice  level  2 
parameters  supplied  by  Orbit  for  the  10-28-93  run  are  shown  in  Figure  12-17  for  common 
mode  inputs  of  1.2V,  2.0V,  and  2.8V.  The  gain  Aj  of  each  pair  determined  by  the  simulation 
was  1.97,  with  a  combined  gain  of  7.70.  The  difference  between  these  values  and  that 
predicted  by  equation  (12.13)  is  due  to  the  use  of  a  more  accurate  model  which  includes 
second-order  effects,  such  as  channel  geometry  and  threshold  variations,  by  the  HSPICE 
simulation  program. 

The  fact  that  the  two  ouput  voltages  are  symmetric  about  Voc  =  3.67V  only  for  input 
differences  less  than  lOmV  has  no  impact  on  the  threshold  test  as  only  the  range  of  the 
positive  difference,  Vo  >  Voc-,  is  important.  The  output  response  shows  good  common-mode 
rejection  for  the  first  70mV  of  input  difference  and  only  a  slight  dependence  on  the  common 
mode  level,  which  should  not  greatly  affect  the  overall  performance  of  the  edge  detector, 
for  differences  between  70mV  and  200mV. 

12.2.2  Comparator  circuit 

The  comparator  and  dynamic  NOR  circuits  used  for  the  threshold  tests  are  shown  in 
Figure  12-18.  The  basis  of  the  comparator  is  a  standard  clocked  CMOS  sense  amplifier 
developed  for  measuring  small  voltage  differences  in  memory  circuits  [99].  When  the  clock 
signal  R^,  is  low  and  i?4  is  high,  one  of  the  gates  of  the  back-to-back  inverters  is  precharged 
to  the  output  value  from  the  differential  amplifier,  while  the  other  is  precharged  to  the 
voltage  representing  the  threshold  value.  When  i?3  is  brought  high,  with  R4  still  high, 
the  sources  of  the  two  n-fets  at  the  base  of  the  sense  amp  are  grounded.  The  gate  of  the 
transistor  precharged  to  the  higher  of  the  two  input  voltages  wiU  pull  more  current  initially, 
and  will  thus  bring  its  drain  voltage  to  ground  more  quickly,  than  the  other  transistor  whose 
gate  is  precharged  to  the  lower  voltage.  Since  the  drain  of  each  transistor  is  connected  to 
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Figure  12-19:  Sense  amplifier  response  with  resolution  of  ^  =  lOmV 
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the  gate  of  the  other,  when  one  drain  goes  to  ground  it  shuts  off  the  opposing  transistor, 
preventing  its  drain  from  discharging  more.  One  clock  tick  after  R3  is  brought  high,  and 
the  drain-gate  voltages  on  the  transistors  have  settled,  R4  is  brought  low,  connecting  the 
sources  of  the  two  p-fets  at  the  top  of  the  sense  amp  to  Vdd-  The  process  is  then  repeated 
in  reverse  such  that  the  side  which  was  not  completely  discharged  when  R3  was  brought 
high  is  now  brought  all  the  way  to  Vdd- 

The  sense  amplifier  thus  produces  two  binary  ouputs,  one  for  each  input,  which  are  high 
when  the  correponding  input  is  greater  than  the  other  one  and  low  when  the  corresponding 
input  is  less.  A  real  sense  amplifier,  however,  has  a  finite  resolution  due  to  the  VkTC 
noise  in  charging  the  gate  capacitances^  and  thus  cannot  measure  differences  in  voltages 
that  are  arbitrarily  close  together.  The  resolution,  6,  is  defined  for  a  given  confidence  level 
a  such  that  the  sense  amplifier  will  produce  the  correct  output  with  probability  p  >  a 
when  the  magnitude  of  the  difference  in  the  two  inputs  is  greater  than  6/2.  An  example 
response  characteristic  for  the  present  situation  is  illustrated  in  Figure  12-19  where  the  x- 
axis  is  plotted  for  A|AF|,  given  Voc  =  3.6V  and  assuming  no  systematic  offset  in  the  sense 
amplifier.  In  between  the  dashed  lines,  where  Voc  +  A|AV|  <  Vj  6/2  and  Voc  +  A|AV|  > 
Vr  -  6/2,  the  probability  that  the  sense  amplifier  output  is  correct  is  less  than  a.  The 
resolution,  6,  shown  in  the  diagram  as  lOmV,  is  the  horizontal  distance  between  the  two 
dashed  lines  and  represents  the  minimum  voltage  difference  which  can  be  reliably  measured. 

The  sense  amplifier  output  connected  to  the  14  input  side  is  fed  into  an  inverter  whose 
output  is  gated  to  one  of  the  inputs  on  the  adjoining  NOR  circuit.  The  inverter,  which 
consists  of  a  single  u-fet  with  a  p-fet  load  clocked  by  R4.,  is  used  to  isolate  the  sense  amplifier 
from  the  uneven  capacitance  of  the  NOR  input  gates.  The  fact  that  the  inverter  input  itself 
creates  a  capacitive  imbalance  between  the  two  sides  of  the  sense  amplifier  is  unimportant 
as  the  base  level  of  the  threshold  voltage  14  can  be  adjusted  to  compensate  for  the  resulting 
offset.  The  important  issue  is  that  the  input  capacitance  to  the  inverter  is  constant,  while 
that  of  the  NOR  circuit  depends  on  the  result  of  the  previous  test.  An  analysis  of  the  NOR 
circuit  shows  that  the  gate-source  capacitances,  of  the  two  p-fets  are  each  a  function 
of  the  charged  state  of  the  other  transistors,  as  this  determines  whether  or  not  there  is  a 
conducting  path  through  the  circuit  during  the  precharge  phase  of  the  sense  amplifier. 

Since  the  inverter  output  is  low  when  14  wins  the  comparison,  the  NOR  output  will  be 


^See  Section  11.6.1  for  a  discussion  of  charging  noise. 
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Figure  12-20:  Edge  charge  storage  latch. 


high  only  if  the  threshold  voltage,  Vr  is  greater  than  both  l^c  +  AAE  and  Voc  -  AAE. 
In  this  case  the  switch  transistor  at  the  bottom  of  the  two  transistor  chain  connected  to 
the  edge  storage  latch  input  is  turned  on.  The  second  transistor,  which  is  clocked  by  the 
signal  CG,  is  turned  on  after  both  comparisons  ha.ve  been  completed  and  the  NOR  output 
is  stable.  If  the  NOR  output  is  high,  the  storage  latch  input  is  connected  to  ground,  and 
the  edge  signal  is  discharged.  If  it  is  low,  however,  the  lower  switch  is  open,  and  raising  CG 
has  no  effect  on  the  state  of  the  storage  latch. 

12.2.3  Edge  storage 

The  edge  storage  latch,  which  consists  of  a  pair  of  cross-coupled  p-fets  along  with  two 
?r-fets  for  initializing  and  discharging  the  edge  signal,  is  shown  in  Figure  12-20.  At  the 
beginning  of  the  multi-scale  veto  procedure,  before  any  threshold  tests  are  performed,  the 
latch  is  initialized  by  bringing  the  SET  signal  high.  This  action  turns  on  the  lower  n-fet, 
bringing  its  drain  voltage  to  ground,  and  thereby  turning  on  the  upper  left  p-fet  which  pulls 
the  n-fet  gate  all  the  way  to  Vdd-  When  SET  is  brought  low,  the  positive  feedback  of  the 
n-p  transistor  combination  will  maintain  the  charged  state  of  the  latch  indefinitely  as  long 
as  the  gate  of  the  lower  7?,-fet  is  not  grounded. 

If  the  edge  is  vetoed,  however,  i.e.,  if  the  NOR  output  is  high,  the  ra-fet  gate  will  be 
connected  to  ground  when  CG  goes  high.  If  this  occurs,  the  upper  right  p-fet  will  be 
turned  on  and  will  pull  the  drain  of  the  ?r-fet  to  Vdd,  shutting  off  the  upper  left  p-fet.  The 
discharged  state  of  the  input  to  the  CMOS  inverter  formed  by  the  righthand  n-p  transistor 
combination  is  also  stable  since  the  SET  transistor  is  not  turned  on  again  for  the  remainder 
of  the  edge  detection  procedure  and  there  is  no  other  mechanism  for  recharging  the  latch. 

The  edge  signals  for  a  given  row  are  read  out  by  bringing  the  ‘Row  Select’  signal  high 
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which  connects  the  latch  output  to  the  bit  line  for  its  column.  The  bit  line  is  in  turn  con¬ 
nected  to  an  inverting  digital  output  pad  driver  to  bring  the  signal  off-chip.  The  current 
for  charging  and  discharging  the  pad  driver  is  supplied  by  the  CMOS  inverter  on  the  right- 
hand  side  of  the  latch.  Since  this  current  can  be  supplied  without  affecting  the  state  of  the 
inverter  input,  the  edge  signals  can  be  read  out  nondestructively  at  any  point  during  the 
edge  detection  procedure. 


Chapter  13 


Edge  Detector  Test  Results 


Several  fabrication  runs  through  the  Orbit  CCD  process  were  necessary  to  finalize 
and  debug  the  design  of  the  multi-scale  veto  edge  detector.  After  several  unsuccessful 
attempts  to  build  a  charge-based  absolute  value  of  difference  circuit,  similar  to  the  one 
used  by  Keast  [86],  with  the  required  resolution,  the  transistor-based  design  described  in 
Section  12.2.1  was  developed.  Tinychips^  containing  test  structures  from  the  new  circuit 
were  sent  out  on  6-30-93,  and  based  on  the  results  from  this  run,  a  32x32  array — which  is 
the  largest  size  that  could  fit  on  the  maximum  7.9mm  x  9.2mm  die — along  with  a  second 
full-size  chip  containing  isolated  test  structures,  were  sent  out  for  fabrication  on  10-28-93. 

Using  the  test  system  described  in  the  next  section,  which  was  designed  based  on  the 
layout  of  the  32x32  array  processor,  it  was  possible  to  obtain  a  more  accurate  characteri¬ 
zation  of  the  CCD  performance  than  had  been  available  from  the  much  simpler  test  setup 
used  with  the  Tiny  chips.  Several  previously  unnoticed  problems,  such  as  charge  trapping 
in  the  input  shift  register  and  the  backspill  problem  discussed  in  Section  12.1.2,  were  thus 
discovered,  and  a  final  design,  with  these  minor  issues  corrected,  was  sent  out  on  4-27-94. 
The  die  photographs  of  the  fabricated  32x32  array  and  the  full-size  test  structures  chips  are 
shown  in  Figures  13-1  and  13-2.  The  test  results  presented  in  this  chapter  were  obtained 
from  the  chips  returned  from  this  latest  run. 


Tinychip  is  a  low-cost  2.22mm  x  2.25mm  die  size  otfered  by  MOSIS  for  test  designs. 
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Figure  13-1:  Die  photograph  of  32x32  array. 
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Figure  13-2:  Die  photograph  of  the  test  structures  chip. 
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13.1  Test  System 

In  order  to  test  the  prototype  multi-scale  veto  chip  it  was  necessary  to  supply  a  total  of 
31  programmable  control  signals: 

•  The  six  clock  phases  for  driving  the  CCD  array,  plus  the  floating-gate  reset 

signal,  Vfg,  and  the  control  signal,  for  the  backspill  prevention  circuit; 

•  The  edge  detection  control  signals:  i?i,  R2,  -R3,  R4,  CG,  and  SET; 

•  The  input  and  shift  register  clock  phases:  TGi,  TG2,  and  ^5g,  and  the  two 

control  signals  for  switching  the  variable  analog  waveforms  connected  to  the  stop  gate, 
SG,  and  the  input  diffusion,  Vd;  and 

•  Eight  signals  for  selecting  the  row  of  edge  outputs  to  be  read  and  for  enabling  and 
pre- charging  the  output  drivers. 

It  was  also  necessary  to  supply  progammable  analog  waveforms  for  the  input  and  reference 
voltages,  Vin  and  Vref,  as  well  as  for  the  edge  threshold  voltages,  Vr,  and  to  read  and  digitize 
the  analog  voltages  from  the  floating  gate  amplifiers  representing  the  smoothed  brightness 
values.  Finally,  a  wide  data  path  connected  to  a  high-speed  memory  buffer  was  needed  to 
store  the  edge  outputs  as  they  are  read  out. 

The  test  system  designed  to  perform  these  functions  is  illustrated  in  Figure  13-3.  In 
order  to  preserve  maximum  flexibility  and  to  minimize  programming  time,  the  system  was 
built  around  a  DELL  486  personal  computer  housing  three  commercially-made  boards  for 
facilitating  the  interface  to  the  device  under  test  (DIJT).  The  first  of  these  boards,  manu¬ 
factured  by  DATEL,  Inc.,  is  a  4-output  power  supply  for  driving  the  MSV  processor  and 
its  supporting  circuitry.  The  second,  also  manufactured  by  DATEL,  Inc.,  is  an  I/O  board 
capable  of  digitizing  16  independent  analog  inputs  to  12  bits  with  a  conversion  time  of  12;us 
per  input.  It  can  also  supply  4  independent  analog  outputs  from  12-bit  digital  data  stored 
in  4  on-board  registers  with  a  settling  time  of  5^s  to  0.05%  of  full  scale  range.  The  third 
board,  made  by  White  Mountain  DSP,  Inc.,  is  a  TI  TMS320C40-based  evaluation  board, 
originally  designed  for  testing  applications  using  the  ‘C40  microprocessor  before  building  a 
stand-alone  system.  In  the  present  test  system,  this  board  turned  out  to  be  very  useful  due 
to  its  32-bit  bus  interface  and  access  to  the  ‘C40  read/write  control  signals  via  the  96-pin 
on-board  Eurocard  connector.  It  also  has  a  high-speed  internal  path  to  an  on-board  4K 
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DELL  486  PC  Housing 


Figure  13-3:  Test  system  design 


Mountain-40  C40  Daughter  Board 
Evaluation  Board 
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dual-port  SRAM  which  can  be  accessed  by  the  PC  as  shared  memory.  All  three  boards 
communicate  with  the  DELL  over  its  PC/AT  bus  and  can  be  programmed  in  a  high-level 
language.  In  addition,  programs  for  the  ‘C40  evaluation  board  can  be  loaded  and  executed 
independently  of  the  DELL’s  486  microprocessor  so  that  separate  programs  can  be  executed 
asynchronously  to  drive  the  clock  waveforms  for  the  test  system  and  to  send  and  acquire 
data  over  the  I/O  board. 

The  custom-designed  portions  of  the  test  system  include  a  pair  of  boards  outside  the 
PC  which  contain  the  device  under  test,  circuitry  for  generating  and  switching  the  constant 
analog  bias  voltages  used  in  the  processor,  and  the  clock  driver  circuits.  In  addition,  a 
daughter  board,  which  mates  with  the  ‘C40  evaluation  board  via  the  96-pin  connector,  was 
designed  to  manage  the  flow  of  data  across  the  32-bit  bus.  The  daughter  board  contains 
a  bank  of  registers  where  the  values  for  the  31  control  signals  for  the  MSV  processor  are 
latched  and  a  set  of  tri-state  bus  drivers  used  to  transmit  the  edge  outputs  over  the  bus 
to  the  internal  SRAM.  One  32-bit  path  connects  a  set  of  tri-state  buffers  on  the  device 
board  to  the  bus  drivers  on  the  daughter  board.  Since  the  prototype  processor  contains  a 
32x32  array,  there  are  63  edge  outputs  per  row — 32  vertical  and  31  horizontal — which  are 
output  simultaneously.  The  external  buffers  are  used  to  select  one-half  of  the  edge  signals 
to  transmit  to  shared  memory  during  one  read  cycle. 

A  second  32-bit  path  connects  the  31  register  outputs  and  the  system  ground  on  the 
daughter  board  to  the  clock  drivers  that  generate  the  control  signal  waveforms  for  the 
device  under  test.  External  drivers  are  necessary  as  the  register  outputs  are  neither  clean 
enough  nor  strong  enough  to  drive  the  CCD  gates  and  the  edge  detection  circuits  directly. 
For  proper  operation,  it  is  important  to  control  the  rise-  and  fall-times  of  the  waveforms 
and  to  minimize  ringing.  The  drivers  must  also  supply  enough  current  to  bring  the  highly 
capacitive  loads  formed  by  the  CCD  gates  to  their  final  levels  within  the  time  allowed. 

The  clock  drivers  were  built  based  on  a  standard  circuit,  shown  in  Figure  13-4,  used  in 
some  CCD  cameras  and  described  in  reference  [100].  This  circuit  uses  a  National  Semicon¬ 
ductor  DS0026  chip  containing  two  drivers  which,  when  supplied  with  TTL  digital  inputs, 
will  produce  an  inverted  output  capable  of  driving  a  lOOOpF  load  at  5MHz.  A  2KD  po¬ 
tentiometer  is  used  to  adjust  the  rise-  and  fall-times  of  the  clock  signals  to  approximately 
lOOns,  and  a  diode  protection  circuit  prevents  the  clock  waveform  from  ringing  below  the 
MSV  chip  substrate  voltage,  I/ss.  This  protection  circuit  is  crucial  to  prevent  the  on-chip 
input  protection  diodes  from  turning  on  and  causing  charge  injection  into  the  substrate 
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Figure  13-4:  Clock  driver  circuit  used  in  the  test  system 


which  can  then  be  collected  in  the  CCD  array. 

One  disadvantage  of  the  test  system  as  it  is  designed  is  that,  despite  the  40ns  instruction 
cycle  time  of  the  ‘C40,  the  minimum  control  signal  pulse  width  is  limited  to  800ns  both 
by  the  programming  overhead  and  by  the  propagation  delays  through  the  system.  It  was 
thus  impossible  to  test  how  much  faster  than  625KHz  that  the  MSV  processor  could  be 
operated.  Convenience  and  flexibility  in  debugging  the  design  of  the  processor  itself  were 
determined  to  be  more  important,  however,  than  optimizing  the  test  system  for  speed. 

13.2  Driving  Signals  OfF-Chip 

Two  types  of  output  pad  drivers,  one  digital  and  one  analog,  were  used  to  drive  signals 
from  the  processor  off-chip.  The  digital  pad  drivers,  which  are  used  with  the  edge  signal 
bit  lines,  are  simply  large  CMOS  inverters  designed  to  drive  a  20pF  load  from  OV  to  4V,  or 
from  5V  to  IV,  in  less  than  10ns  for  a  ±.5V  step  input.  The  input  capacitance  of  the  digital 
driver,  as  seen  from  the  edge  storage  latches  in  the  array,  is  approximately  2.5pF,  of  which 
only  .5pF  are  due  to  the  gate  capacitance  of  the  inverter  and  the  other  2pF  are  due  to  the 
capacitance  of  the  bit  line  itself.  Simulation  results  indicate  that  the  storage  latch  can  drive 
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Figure  13-5:  Analog  output  pad  driver  circuit 


the  inverter  input  from  5V  to  <  IV  in  35ns,  and  from  OV  to  >  4V  in  55ns.  The  rise  and  fall 
times  of  the  edge  output  signals  on  the  test  chips,  measured  using  an  oscilloscope  from  the 
time  that  the  ‘Row  Select’  signal  was  brought  high,  were  confirmed  to  be  approximately 
the  same  as  predicted  by  the  simulation.  Given  the  800ns  control  signal  pulse  widths  used 
in  the  test  system,  the  edge  outputs  had  ample  time  to  become  stable  before  being  read 
into  the  ‘C40  SRAM. 

The  analog  pad  drivers  were  designed  to  buffer  the  output  voltages  from  the  different 
isolated  structures  laid  out  on  the  test  chip,  as  well  as  from  the  floating  gate  amplifiers  on 
the  east  boundary  of  the  array  which  were  used  to  output  the  smoothed  image  data.  Since 
the  original  voltages  could  only  be  recovered  by  correcting  the  measured  signals  for  the 
pad  driver  response,  it  was  very  important  to  achieve  good  matching  between  the  different 
drivers  in  order  to  accurately  measure  the  responses  of  the  on-chip  structures. 

The  analog  pad  driver  circuit,  which  is  shown  in  Figure  13-5,  consists  of  a  pair  of 
cascaded  n-  and  p-type  source  followers.  To  achieve  good  matching,  the  transistors  were 
drawn  with  large  WJ L  ratios  to  minimize  the  percentage  effects  of  process  variations.  Since 
the  on-chip  structures  generally  pull  very  Uttle  current,  it  was  important  to  minimize  the 
additional  load  capacitance  due  to  the  drivers.  The  total  gate  capacitance  of  the  first  stage 
is  approximately  230fF,  which  is  almost  7  times  less  than  the  1.5pF  capacitance  of  the 
second  stage  that  drives  the  output  load.  The  metal2  lines  leading  to  the  pad  drivers, 
which  are  much  shorter  than  the  edge  signal  bit  lines  that  run  across  the  chip,  have  an 
average  capacitance  of  approximately  120fF,  giving  a  total  load  of  about  350fF. 
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Pad  Drive  Response  Function 


Figure  13-6:  Analog  output  pad  driver  characteristic  -  simulated  vs.  actual  (Viow  =  1.2V, 
Vh^gl^  =  3.8V). 


The  input  stage  was  chosen  as  an  n-type  device  in  order  to  sense  the  full  range  of 
output  voltages  from  the  various  on-chip  structures,  which  are  between  1.5V  and  5V.  Since 
the  voltage  levels  are  shifted  down  by  the  first  stage,  the  second  stage  could  be  built  with 
a  higher  gain  p-type  device  laid  out  with  separate  wells  for  each  transistor  to  eliminate  the 
backgate  effect.  The  total  power  dissipation  of  each  driver  is  approximately  390pW  and  is 
the  highest  of  any  structure  on  the  chip.  Separate  Vdd  GND  rails  were  thus  drawn 
so  that  the  relatively  large  currents  pulled  by  the  drivers  would  not  affect  the  circuits 
on  the  rest  of  the  chip  and  also  so  that  the  actual  power  dissipation  could  be  measured 
independently. 

Figure  13-6  shows  the  results  of  test  measurements  from  24  different  pad  drivers  on 
12  different  chips  compared  with  the  performance  predicted  by  simulation.  The  solid  line 
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represents  the  average  output  from  all  24  drivers.  The  standard  deviation  of  the  individual 
responses  from  the  average  curve  is  8mV  with  a  maximum  absolute  variation  of  17.6mV. 
The  average  measured  gain  is  0.877,  which  is  slightly  better  than  the  predicted  value  of 
0.848. 

13.3  CCD  Tests 

Correct  operation  of  the  MSV  processor  depends  critically  on  the  device  characteristics 
of  the  CCD  structures  used  in  the  array.  The  parameters  which  most  affect  signal  processing 
ability  are  the  magnitudes  of  the  channel  potentials,  the  amount  of  dark  current,  and  the 
charge  transfer  efficiency.  Isolated  test  structures  were  laid  out  to  measure  each  of  these 
parameters,  as  well  as  to  measure  the  the  input/output  characteristics  of  the  fill-and-spill 
and  floating-gate  amplifier  structures  used  to  interface  with  the  array. 

13.3.1  Channel  potential  measurements 

To  measure  the  maximum  potential,  4>max^  in  the  buried  channel  as  a  function  of  the 
applied  gate  voltage,  a  separate  structure  was  laid  out  on  the  test  chip  containing  two  CCD 
‘transistors’  formed  by  placing  n+  diffusions  on  either  side  of  an  isolated  polysilicon  gate 
covering  a  segment  of  buried  channel.  Two  such  transistors  were  needed  for  each  of  the 
two  polysilicon  layers.  The  four  diffusions  were  connected  via  metal  contacts  to  separate 
unprotected  and  unbuffered  pins  so  that  their  voltages  could  be  measured  directly. 

To  measure  the  potentials,  the  ‘source-follower’  method,  described  by  Taylor  and  Tasch 
in  [101],  was  nsed,  as  this  approach  was  the  simplest  to  apply  given  the  test  system  setup. 
In  this  method,  the  substrate  is  grounded  and  a  voltage  is  applied  to  the  gate  of  each  CCD 
transistor.  The  diffusions  on  one  side  of  the  transistors,  acting  as  the  drains,  are  set  to 
a  high  voltage,  while  the  source  diffusions  are  connected  to  ground  via  a  high  impedance 
load.  In  this  situation,  the  transistors  are  at  the  threshold  of  conduction  so  that  the  source 
voltage  is  equal  to  (pmax-  Due  to  the  potential  differences  across  the  metal  contacts  and  the 
n-n-f  and  p-p-f  junctions,  the  measured  voltage  is  Vs  —  Vbi,  where  I4i  is  the  built-in  voltage 
across  the  n-p  junction  at  the  channel-substrate  interface.  The  channel  potentials  are  thus 
recovered  by  adding  0.8V,  which  is  the  value  given  by  Orbit  for  Vbi,  to  the  measnred  values. 

The  4>max^yg  curves  obtained  for  the  chips  from  the  4-27-94  and  the  10-28-93  runs  are 
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Figure  13-7:  CCD  channel  potentials  (4-27-94  run) 


CCD  channel  potentials  (10-28-93  run) 


Figure  13-8:  CCD  channel  potentials  (10-28-93  run) 
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plotted  in  Figures  13-7  and  13-8.  A  process  change  was  made  by  Orbit  for  the  4-27-94  run 
which  had  the  effect  of  lowering  the  potentials  by  approximately  2V  from  those  of  the  earlier 
run.  This  decrease  does  not  greatly  affect  the  results  of  the  calculations  in  Sections  12.1.1 
and  12.1.3  used  to  determine  the  sizes  of  the  input  and  floating-gate  structures,  as  these 
depend  mostly  on  the  differences  in  the  channel  potentials  at  different  gate  voltages  rather 
than  on  their  actual  value.  It  does,  however,  affect  the  charge-carrying  capacity  of  the  CCD 
gates,  and  therefore  the  signal-to-noise  ratio  of  the  devices. 

As  can  be  seen  from  the  diagrams,  there  is  little  change  in  the  slopes  of  the  (/)max~Vg 
curves  between  the  two  runs.  The  less  than  unity  slope  of  .885  for  polyl  and  .877  for  poly2 
is  due  to  the  capacitive  divider  between  C'e//  and  Cd2  explained  in  Sections  11.2  and  11.6.2. 
Since 

^4’max _ ^/Cd2 

AF,  ~  1/C, 2  +  1/Ceff  ^  ^ 


we  have  (for  polyl) 


(13.2) 


which  is  less  than  half  the  value  calculated  from  the  estimated  design  parameters  given  in 
Table  12.2. 


13.3.2  Input  and  output 

The  fill-and-spill  and  floating-gate  amplifier  structures  were  tested  jointly  by  combining 
both  devices  in  a  single  1-D  gate  array.  A  separate  test  structure  containing  only  the  source- 
follower  buffer  used  in  the  floating-gate  amplifier  was  also  laid  out  on  the  test  chip  so  that 
it  could  be  characterized  independently. 

The  measured  source-follower  response  from  one  chip  is  plotted  in  Figure  13-9,  with 
the  dashed  line  indicating  the  measured  voltages,  and  the  solid  line  representing  the  data 
corrected  for  the  analog  pad  driver  response.  The  average  gain  measured  from  twelve  devices 
on  twelve  different  chips  was  0.895,  which  is  considerably  higher  than  the  value  of  0.800 
predicted  by  the  simulation  shown  in  Figure  12-5.  This  discrepancy  is  due  to  the  smaller 
actual  values  of  the  process  parameters  7  =  y/'lq^si^ a! Cox  and  A,  the  channel  length 
modulation  parameter,  both  of  which  determine  the  magnitude  of  g^b-,  from  those  used  in 
the  simulation.  The  values  given  for  the  4-27-94  run  were  7  =  0.4493  and  A  =  0.0304,  as 
opposed  to  the  values  from  the  10-28-93  run  given  as  7  =  0.4977  and  A  =  0.0318.  Matching 
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Floating  Gate  Source  Follower  Response  Function 


Figure  13-9:  Measured  source  follower  characteristic  for  floating  gate  amplifier  (f4ms  = 
IV). 


between  the  different  source-followers  was  relatively  good,  with  an  overall  standard  deviation 
of  14.5mV  from  the  average  output  values,  and  a  maximum  deviation  of  37mV. 

The  results  of  the  combined  fill-and-spill  and  floating-gate  amplifier  structures  from 
twelve  different  chips  are  shown  by  the  individual  points  in  Figure  13-10,  with  the  solid 
line  representing  the  average  output.  The  values  are  plotted  against  the  difference  in  the 
voltages  applied  to  the  input  and  reference  gates.  Since  these  gates  are  in  different  levels 
of  poly  (see  Figure  12-13  for  the  layout  of  the  fill-and-spill  structure),  the  plot  begins  for 
Vin  —  Vref  =  —0.3V,  wliicli  is  approximately  the  magnitude  of  the  potential  mismatch 
between  the  two  levels. 

The  standard  deviation  of  the  different  outputs  from  the  average  is  31mV  with  a  max¬ 
imum  deviation  of  121mV.  It  should  be  noted  that  the  differences  in  the  outputs  are  a 
combination  of  the  mismatches  between  the  source-follower  buffers  and  the  variations  in 
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Floating  Gate  Output  Characteristic 
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Figure  13-10:  Measured  floating  gate  amplifler  output  vs.  fill-and-spill  input,  Vin  -  Ke/- 


the  channel  potentials  of  the  flll-and-spill  input  devices.  Since  there  is  only  one  input  struc¬ 
ture  for  each  processor,  differences  in  the  floating-gate  amplifier  responses  within  the  same 
array  should  be  due  only  to  mismatches  in  the  source-followers. 

The  total  output  swing  of  the  floating-gate  amplifiers  is  2V,  from  a  maximum  of  2.8V 
for  Vin  —  Vref  =  -0.3V  to  the  minimum  value  of  0.8V  at  Vin  -  Vrej  =  1-4V.  The  average 
slope  of  the  response  curve  over  this  range  is  -1.48.  Correcting  for  the  source-follower  gain 
of  0.895  thus  gives  the  change  in  the  floating  gate  voltage  per  unit  change  in  the  applied 
signal  voltage  as 

AV 

^  -1.65  (13.3) 


where  Vstg  =  Vm  -  Vref  +  '^'th  ^pi_p2  representing  the  polyl-poly2  mismatch. 
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From  equations  (11.28)  and  (11.53),  we  have 


.y  _  Q  si  g  ( _ IjCdl _ 

"  “  Cload  \llCd2  +  1/Ce//^^  +  IjCload 


(13.4) 


Qsig  —  ^'eJf„.,Vsig 


(13.5) 


where  the  distinction  has  been  made  between  the  effective  capacitances  of  the  polyl  and 
poly2  structures.  Combining  these  equations  gives 


C’e//, 


1/Cd2 


Cload  \llCd2  +  +  ^l^load 


(13.6) 


Using  the  value  of  Cd2/Ceff^^^  =  .13  computed  from  the  channel  potential  measure¬ 
ments,  equations  (13.3)  and  (13.6)  are  consistent  with 


Cload 


(13.7) 


If  we  take  the  value  of  9.2  x  10~^‘F/iJ,m^  computed  for  Cioad  equation  (12.5),  we  thus 
have  Cejfp^  =  1.8  x  10“^®F//tm^.  With  =  0.3V  and  1.4V  being  the  maximum  value 

of  Vin  -  Vref,  we  find 


_  Qsig,T. 


=  1912  electrons ! 


(13.8) 


The  input  gate  area  being  900/im^,  the  maximum  number  of  signal  electrons  is  approxi¬ 
mately  1.7  X  10®. 


13.3.3  Dark  current  measurements 

In  order  to  use  CCDs  for  signal  processing,  all  computations  must  be  completed  in  less 
time  than  it  takes  for  dark  current  to  appreciably  affect  the  signals.  Dark  current  levels  for 
the  test  chips  were  measured  using  the  floating-gate  amplifier  structure  by  initializing  the 
node  gate  to  its  high  voltage  and  then  allowing  it  to  float  without  introducing  charge  via 
the  gate  array.  Since  dark  current  is  the  only  source  of  charge  into  the  potential  well  under 
the  floating  gate,  the  magnitude  of  the  current  can  be  estimated  by  measuring  the  change 
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Dark  current  accumulation  measurement 


Figure  13-11:  Dark  current  accumulation  for  cooled  and  uncooled  chips. 


in  the  gate  voltage  over  time. 

Figure  13-11  shows  the  results  of  two  tests,  one  with  the  chip  at  room  temperature  and 
the  other  with  the  chip  cooled  to  approximately  0°C.  The  output  voltages  were  sampled  at 
r25//s  intervals  for  0.25  seconds.  The  values  plotted  are  the  computed  gate  voltages  after 
correcting  for  the  responses  of  both  the  source-follower  on  the  floating  gate  amplifier  and 
the  output  pad  driver.  From  equation  (13.4),  the  slopes  of  the  curves  are  related  to  the 
dark  current  density,  Jd  by 

AVg  ^  _Jd_  f _ l/Cda _ 

At  Cload  \1/Cd2  +  +  1/Cload 


(13.9) 
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Using  the  previously  estimated  values  for  Cioad  Cd2/Ceffp^,  we  thus  have 

at/ 

Jd  =  X  1.1  X  10-^ A/cm^  (13.10) 

At  ' 

For  the  chips  at  room  temperature  with  a  slope  of  AVg/ At  =  —1.74V / sec,  .Jd  is  approx¬ 
imately  —  lOra^/cur^,  while  for  the  cooled  chips,  with  AVg/At  =  — 0.38V/sec,  Jd  is  only 
-4.2nA/cm‘^.  It  should  be  noted  that  the  uncooled  dark  current  levels  fluctuate  tremen¬ 
dously  as  both  room  and  chip  operating  temperatures  vary.  The  uncooled  values,  however, 
were  never  measured  at  less  than  -1.4V/sec  and  were  often  as  high  as  -2.5V/sec.  These 
values  are  high  with  respect  to  commercial  grade  CCDs,  but  are  not  unusual  for  the  Orbit 
process  which  is  not  optimized  for  dark  current^.  For  tests  involving  total  processing  times 
of  more  than  5ms,  it  was  thus  necessary  to  cool  the  chips  to  achieve  proper  operation. 


13.3.4  Transfer  efficiency 

In  order  to  measure  charge  transfer  efficiency,  a  special  structure  composed  of  a  chain 
of  27  1-D  unit  cells  together  with  a  boundary  cell  containing  a  flll-and-spill  input  device 
was  laid  out  on  the  test  chip.  The  1-D  cells  are  simply  truncated  and  rearranged  versions 
of  2-D  unit  cells  which  are  missing  the  vertical  sections.  The  layout  of  these  cells  is  such 
that,  when  placed  end-to-end,  a  linear  array  is  formed  which  is  identical  in  operation  to  the 
2-D  array,  with  the  exception  that  there  is  no  vertical  movement  of  charge.  Since  each  cell 
contains  12  gates,  the  signal  charge  in  the  test  array  is  transferred  through  336  gates  from 
the  end  of  the  flll-and-spill  structure  to  the  final  node  gate,  whose  floating-gate  amplifier 
output  is  connected  to  one  of  the  analog  pad  drivers. 

As  explained  in  Section  11.4,  charge  transfer  efficiency  can  be  measured  from  the  ratio 
of  the  size  of  the  charge  packet  under  the  Nth  transfer  gate  to  the  original  packet  size  at 
gate  0.  From  equation  (11.31),  we  find  that 


Qn  _  N 

Qo 


(13.11) 


To  measure  transfer  efficiency  with  the  given  test  structure,  the  array  was  initially  flushed 
by  executing  the  horizontal  transfer  sequence  many  times  (>  100)  with  no  input  charge 


^Private  communication  with  Paul  Suni  of  Orbit  Semiconductor,  Inc. 
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Charge  T ransfer  Efficiency  Test 


Figure  13-12:  Charge  transfer  efficiency  measurement. 


introduced  through  the  fill-and-spill  device.  The  control  voltages,  Vref  and  Vin,  were  then 
set  to  give  a  maximum  size  charge  packet,  and  the  fill-and-spill  and  transfer  sequences  were 
executed  repeatedly  to  move  a  series  of  equal  size  packets  into  the  array.  By  measuring  the 
floating-gate  amplifier  output  of  the  final  node  at  the  end  of  each  transfer  stage,  the  size 
of  the  first  charge  packet  in  the  series  which  arrives  at  the  node  can  be  compared  with  the 
size  of  the  following  packets.  After  several  transfer  sequences,  steady-state  conditions  will 
be  reached — in  which  as  much  charge  is  lost  to  the  trailing  packets  as  is  picked  up  from  the 
previous  ones — and  the  size  of  the  packets  will  be  approximately  that  of  the  original  signal. 

Figure  13-12  shows  the  results  of  one  test  which  is  typical  of  the  behavior  observed 
in  all  of  the  chips.  Approximating  the  floating-gate  response  as  being  linear  with  AQ, 
the  ratio  of  Qn/Qo  is  given  by  the  ratio  of  the  initial  voltage  drop  from  the  zero-signal 
level  at  stage  28,  when  the  first  packet  arrives,  to  the  final  voltage  drop  measured  several 
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stages  later.  As  seen  in  the  plot,  this  value  is  computed  as  AVi/AVj  =  .17,  and  from 
equation  (13.11)  with  N  =  336,  we  find  the  per-gate  transfer  efficiency  to  be  €  =  0.995. 
Using  N  =  28  in  equation  (13.11),  we  can  also  compute  estage,  the  transfer  efficiency  per 
stage  as  egtage  =  0.939. 

The  measured  transfer  efficiencies  for  the  Orbit  CCD  process  are  very  low  compared  with 
desired  values  of  0.99999  or  better  required  for  large  gate  arrays.  As  seen  in  Section  13.5, 
the  low  CTE  of  these  devices  does  affect  the  results  of  the  32x32  prototype  array  and  limits 
our  abihty  to  characterize  very  large  structures. 

13.4  Differencing  and  Threshold  Test  Circuits 

Three  isolated  structures  were  laid  out  to  test  the  operation  of  the  edge  processing 
circuits  contained  in  the  unit  cells.  The  two  major  components,  the  differential  amplifier 
and  the  voltage  comparator,  were  set  up  to  be  tested  individually,  while  a  third  structure 
contained  the  complete  absolute- value-of-difference  and  threshold  test  circuit. 

13.4.1  Differential  amplifier 

Differential  amplifier  responses  were  measured  for  twelve  different  test  structures  under 
the  same  conditions  (Hias  =  IV,  and  common  mode  values  of  Vc  =  1.2V,  2.0V,  and  2.8V) 
used  in  the  simulation  described  in  Section  12.2.1.  The  results  from  one  of  the  test  circuits 
are  shown  in  Figure  13-13.  Over  all,  the  basic  shape  of  the  response  curve  is  very  similar  to 
that  predicted  by  the  simulation,  with  good  common  mode  rejection  and  differential  gain. 
Ad,  of  7.2.  However,  for  a  given  input  difference,  there  are  significant  variations  in  the 
output  voltages  of  the  twelve  circuits  due  to  variations  both  in  the  amplifier  offset  voltages 
and  in  the  values  of  the  common  mode  outputs,  Voc- 

The  average  value  of  Voc  among  the  twelve  circuits  was  3.43V,  with  a  standard  deviation 
of  38mV.  Referring  to  Figure  12-16,  the  common  mode  output  voltage  is  determined  by  the 
amount  of  drain  current  flowing  through  each  side  of  the  second-stage  amplifier  when  it  is 
in  its  balanced  state  and  by  the  effective  resistance  of  the  diode- connected  p-fet  load.  Given 
that  the  drain  current  in  the  balanced  state  is  one-half  of  the  current  through  the  lower 
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Differential  Amplifier  response  function  (corrected  for  pad  drive) 


Figure  13-13:  Measured  differential  amplifier  characteristic  for  common  mode  voltages 
Vc  -  1.2V,  2.0V,  and  2.8V  with  Vbias  =  IV. 


bias  transistor,  it  can  easily  be  seen,  using  the  notation  of  Section  12.2.1,  that 

V,,  =  Vdd  +  \Vtp\  -  (Vbias  -  Vtn)^  2Kp{W]^ 

where  Vfn  and  Vtp  are  the  n-  and  p-fet  threshold  voltages.  Since  the  quantity  under  the 
radical  is  much  larger  than  one,  variations  in  Voc  thus  depend  strongly  on  variations  in  the 
threshold  voltages  of  the  bias  transistors. 

The  offset  voltage  of  a  differential  pair,  defined  as  the  difference  in  the  inputs  Vi  -  V2 
required  to  make  the  output  voltages  equal,  is,  from  [98], 


Vos  =  AVi  -I- 


9mp^ 


(Mwim 

V  W/L  J 


(13.13) 
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Combined  Maximum  Differential  Amplifier  Outputs  (corrected  for  pad  drive) 


Figure  13-14:  Combined  maximum  differential  amplifier  outputs  vs.  |Vi-y2|  for  common 
mode  voltages  14  =  1.2V,  2.0V,  and  2.8V  with  Vbias  =  IV. 


where  A14  is  the  difference  in  threshold  voltages  of  the  two  input  transistors,  Id  is  the  aver¬ 
age  drain  current  through  each  side,  {W j L)  is  the  average  size  ratio  of  the  input  transistors, 
and  gmp  is  the  average  transconductance  of  the  two  p-fet  loads.  The  quantities  A(l/pmp) 
and  A(lT/i)  represent  the  differences  in  these  parameters  between  the  two  sides  of  the 
differential  pair. 

The  average  offset  voltage  of  the  twelve  circuits  was  -f4.5mV,  with  a  standard  deviation 
of  3.3mV.  Given  the  very  low  bias  current  (~  400nA  with  Vhias  =  IV)  and  the  large  W/L 
ratios  used  in  the  transistors,  the  variations  in  both  Vos  and  14c  can  be  attributed  almost 
entirely  to  differences  in  the  threshold  voltages. 

The  effect  of  these  variations  on  the  overall  operation  of  the  MSV  processor  can  be 
judged  from  the  plot,  shown  in  Figure  13-14,  of  the  maximum  amplifier  outputs  for  all 
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Sense  Amplifier  switching  characteristic 


V  in 


Figure  13-15:  Measured  sense  amplifier  switching  characteristic. 


twelve  circuits  at  each  of  the  three  common  mode  values  against  the  absolute  input  difference 
|l^i  —  ^2 1-  This  plot  can  be  considered  as  a  measurement  of  the  resolution  of  the  differential 
amplifier  and  gives  a  lower  bound  on  the  resolution  of  the  complete  edge  detection  circuit. 
For  IFi  —  F2I  <  TV,  the  horizontal  distance  between  the  dashed  lines,  which  approximates 
the  smallest  measurable  difference  in  the  inputs,  is  roughly  28mV.  For  absolute  differences 
above  .IV,  however,  the  amplifier  saturates  rapidly  so  that  it  is  impossible  to  distinguish 
between  differences  that  are  greater  than  .14V.  When  considered  as  a  percentage  of  the  full 
scale  input  range,  if  this  is  set  to  1.4V,  the  28mV  to  140mV  range  of  measurable  differences 
for  the  actual  circuits  is  in  fact  better  than  the  targeted  swing  of  2.5%  to  10%  of  FSR. 

13.4.2  Sense  amplifier  voltage  comparator 

The  sense  amplifier  switching  characteristic  was  measured  by  setting  one  input,  Vda-,  to 
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a  fixed  value  and  varying  the  other  input,  VV  until  finding  the  point  at  which  the  comparator 
output  changed.  Since,  as  explained  in  Section  12.2.2,  the  output  is  random  when  the  two 
inputs  are  closer  than  S/2,  where  S  is  the  resolution  of  the  sense  amplifier,  the  actual 
measurement  used  to  determine  the  switching  point  was  the  number  of  times  out  of  100 
that  the  comparator  output  was  high.  Two  values,  Vr,iow  ^nd  Vr,high^  were  thus  measured 
for  each  value  of  Vda,  with  the  first  being  the  highest  value  for  which  the  output  was 
high  <  1/100  times,  and  the  second  being  the  lowest  value  for  which  the  output  was  high 
>  99/100  times. 

The  cumulative  results  from  twelve  different  comparator  circuits  are  plotted  in  Figure  IS¬ 
IS,  with  the  dashed  lines  representing  the  envelope  of  the  high  and  low  threshold  voltages 
from  all  twelve  circuits.  The  maximum  horizontal  distance  between  these  lines,  which  is  a 
measure  of  the  sense  amplifier  resolution,  is  9.8mV.  Given  that  this  value  is  divided  by  the 
differential  gain,  Kd,  when  the  input  is  referred  back  to  the  that  of  the  differential  amplifier, 
the  lOmV  resolution  of  the  comparator  should  have  a  negligible  effect  on  the  overall  edge 
detector  circuit  performance. 

It  should  be  noted  from  the  plot  that  there  is  a  slight  offset  of  approximately  15mV 
between  the  threshold  and  input  voltages  when  the  comparator  switches.  This  offset  is 
caused  by  the  capacitive  imbalance  created  by  connecting  an  inverter  input  to  the  14  side. 
Since  the  offset  is  constant,  however,  it  can  be  compensated  for  in  setting  the  value  of  the 
threshold  voltage. 

13.4.3  Combined  edge  veto  test  circuit 

The  complete  multi-scale  veto  edge  detection  circuit,  starting  from  the  inputs  to  the 
source-follower  buffers  of  the  floating-gate  amplifiers  and  ending  with  the  output  of  the 
edge  storage  latch,  was  laid  out  as  an  individual  test  structure  to  evaluate  the  combined 
performance  of  all  of  its  elements.  The  circuit  was  tested  by  providing  a  fixed  voltage 
difference  to  the  source-follower  inputs  and  determining  the  maximum  value  of  14  for  which 
the  input  difference  could  be  considered  as  an  edge. 

Figures  13-16  and  13-17  show  the  results  from  one  circuit,  plotted  against  both  14  -  14 
and  |14  —  14|)  for  input  common  mode  values  of  4.2V,  3.3V,  and  2.5V,  corresponding 
roughly  to  the  high,  medium,  and  low  floating-gate  voltages.  The  common  mode  variation 
in  the  results  reflects  that  of  the  differential  amplifier,  while  the  offset  in  the  circuit  is  the 
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Absolute  Value  of  Difference  Circuit  Response 


Figure  13-16:  Measured  response  of  one  absolute- value-of-difference  circuit. 


Absolute  Value  of  Difference  Circuit  Response 


Figure  13-17:  Measured  response  of  one  absolute- value-of-difference  circuit,  plotted 
against  \Vi  —  V2\. 
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combination  of  both  the  differential  amplifier  offset  and  the  mismatch  in  the  source-follower 
buffers. 

It  should  be  noted  that  the  curves  flatten  much  more  abruptly  for  absolute  input  differ¬ 
ences  greater  than  llOmV  than  those  of  the  differential  amplifier  by  itself.  Closer  analysis 
of  the  combined  circuit  reveals  that  the  cause  of  this  abrupt  flattening  is  the  diminishingly 
small  current  supplied  to  the  sense  amplifier  by  the  high  differential  amplifier  output  side. 
As  the  output  approaches  the  saturation  level,  it  becomes  unable  to  charge  the  sense  ampli¬ 
fier  input  gates  within  the  alotted  precharge  period.  Increasing  the  precharge  time  is  not  an 
effective  solution  to  widening  the  range  of  the  absolute- value-of-difference  circuit,  however, 
as  the  time  needed  rises  very  rapidly  as  the  current  goes  to  zero.  The  preferred  method  for 
increasing  the  range  is  to  raise  the  rail  voltage,  Vdd,  on  the  differential  amplifier,  thereby 
increasing  its  saturation  voltage.  Raising  Vdd  will  also  increase  power  dissipation  during 
the  edge  detection  cycles.  However,  as  the  time  spent  in  edge  detection  is  much  less  than 
that  required  to  load  the  image,  the  net  increase  in  average  power  should  be  negligible. 
Unfortunately,  the  test  system  was  not  designed  to  allow  separate  power  supply  voltages 
for  the  edge  detection  circuits  and  the  CCD  clock  drivers,  and  hence  it  was  not  possible  to 
implement  this  option  in  present  setup. 

The  composite  results  from  twelve  different  absolute-value-of-difference  (AVD)  circuits 
for  common  mode  input  voltages  of  4.2V,  3.3V,  and  2.5V,  with  Vdd  =  5V,  are  plotted 
as  individual  points  in  Figure  13-18.  The  horizontal  distance  between  the  dashed  lines 
bounding  the  results  indicates  the  overall  resolution  of  the  edge  detection  circuits  for  the 
array  processor.  For  values  of  |Fi  -  V2I  <  90mV,  this  distance  is  approximately  65mV, 
while  for  differences  greater  than  lOOmV,  the  distance  becomes  infinite.  The  value  of  65mV 
would  be  correct  for  the  lower  bound  of  distinguishable  differences  of  2.5%  FSR,  if  the  input 
range  is  2.6V.  The  upper  limit,  however,  is  clearly  inadequate  for  the  10%  of  FSR  which 
was  desired. 

13.5  Operation  of  the  Full  Array  Processors 

Two  array  sizes  were  built  to  test  the  operation  of  the  complete  MSV  processor.  A 
32x32  array — the  largest  which  would  fit  on  the  maximum  available  die  size — was  laid  out 
as  a  separate  chip,  while  a  smaller  4x4  array  was  included  on  the  test  structures  chip. 
Given  the  poor  charge  transfer  efficiency  measured  for  the  CCDs  and  the  limited  resolution 
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Absolute  Value  of  Difference  Circuit  Response 


Figure  13-18:  Composite  response  of  twelve  AVD  circuits. 


of  the  AVD  circuit,  it  was  clear  that  it  would  not  be  possible  to  test  very  precisely  the 
processor’s  ability  to  discriminate  between  step  edges,  lines,  and  impulse  noise  as  described 
in  Chapter  5.  The  low  CTE  in  effect  results  in  a  pre-smoothing  operation  as  the  image 
is  loaded,  while  the  limited  AVD  resolntion  restricts  the  number  of  smoothing  cycles  for 
which  interesting  results  can  be  obtained.  Nonetheless,  it  was  possible  to  test  several  general 
characteristics  of  the  array  processors  and  verify  that  their  overall  operation  was  as  planned. 

The  first  test  performed  on  the  4x4  and  32x32  arrays  was  to  compare  the  I/O  charac¬ 
teristics  of  the  different  rows.  This  was  done  by  loading  an  entire  column  with  the  same 
value  of  Vin  —  V^ef  and  measuring  the  floating-gate  amplifier  outputs  of  each  row  as  the 
column  was  shifted  to  the  end  of  the  array.  The  results  are  shown  for  all  rows  of  one  4x4 
array  in  Figure  13-19  and  for  rows  3  through  16  of  one  32x32  array  in  Figure  13-20.  The 
curve  for  the  4th  row  of  the  smaller  array  is  seen  to  be  significantly  shifted  above  those  for 


Figure  13-20:  Floating-gate  outputs  of  32x32  array  processor 
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a.)  0  smoothing  cycles. 


4x4  Array  Output  Values 
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c.)  2  smoothing  cycles. 

Figure  13-21;  Smoothing  of  one-pixel  impulse  on  4x4  array. 


the  other  three  rows  due  to  charge  loss  in  the  input  shift  register.  Since  the  last  row  is  the 
first  to  be  input,  it  receives  the  smallest  input  charge  packet  while  the  preceding  rows  re¬ 
ceive  successively  larger  packets  until  steady-state  conditions  are  reached  (see  Figure  13-12 
for  reference).  The  full  effect  of  charge  loss  in  the  shift  register  is  thus  observed  in  the  plot 
of  the  4x4  array  output,  while  the  output  curves  for  the  14  rows  at  the  top  of  the  32x32 
array  are  indicative  of  the  steady-state  results.  The  chips  were  frozen  prior  to  testing  so 
that  dark  current  would  not  be  a  significant  factor. 

The  small  size  of  the  4x4  array  was  nonetheless  convenient  for  testing  the  smoothing 
and  edge  veto  functions.  A  test  input  was  provided  to  the  array  consisting  of  a  single 
pixel  impulse  with  Vsig  =  Fin  -  Ke/  +  ^pi-p2  =  1.199F  at  the  2nd  row  and  3rd  column. 
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c. )  smoothing  cycle  2,  T2  =  3.848 

Figure  13-22:  Edge  detection  results  for  4x4  array  with  impulse  input. 
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Figure  13-21a  shows  the  input  data,  along  with  the  floating-gate  amplifler  output  of  each 
row  with  no  smoothing.  Again,  due  to  the  poor  charge  transfer  efficiency,  the  impulse  is 
spread  along  the  2nd  row  over  the  first  and  second  columns  as  well  the  third.  Referring 
to  the  I/O  transfer  function  curves  plotted  in  Figure  13-19,  the  output  value  of  2.075V 
measured  at  the  original  location  of  the  impulse  corresponds  to  an  input  of  Vgig  ^  0.985V, 
which  is  very  close  to  the  value  of  0.943V  predicted  using  a  per-gate  GTE  of  0.995  given 
that  the  charge  is  transferred  through  48  gates  before  reaching  the  output  device. 

The  array  outputs  after  one  and  two  smoothing  cycles  are  shown  in  Figures  13-21b  and 
13-21c  next  to  the  values  predicted  by  applying  the  binomial  kernels  of  equations  (12.8) 
and  (12.9)  directly  to  the  unsmoothed  outputs  given  in  Figure  13-21a.  Comparing  the 
results,  it  can  be  seen  that  the  actual  and  predicted  values  are  within  10-20mV  of  each 
other  and  indicate  that  the  smoothing  operation  does  in  fact  closely  approximate  a  2-D 
binomial  convolution.  It  should  be  noted  that  this  method  for  generating  the  theoretical 
values  is  acceptable  as  long  as  the  outputs  lie  within  the  (approximately)  linear  range  of 
the  floating-gate  amplifiers.  Given  the  amount  of  variability  in  the  I/O  transfer  functions 
of  each  row,  this  method  is  also  preferable  to  that  of  translating  the  output  data  to  their 
equivalent  inputs  and  then  performing  the  convolutions. 

The  edge  detection/ veto  results  for  the  same  input  pattern  are  shown  in  Figures  13- 
22a. — 13-22c.  The  threshold  values  of  tq  =  3.872,  ti  =  3.860,  and  T2  =  3.848  were  chosen 
from  Figure  13-17  according  to  the  expected  differences  in  the  pixel  values  at  each  smoothing 
cycle.  Edges  are  indicated  in  the  diagrams  by  the  horizontal  and  vertical  lines  between  the 
‘X’s  that  mark  the  pixels  in  the  array. 

With  no  smoothing,  edges  are  found  betweeir  every  vertical  pair  of  pixels  in  the  first 
column  and  between  the  first  and  second  rows,  between  the  vertical  pairs  of  columns  1-3 
in  the  second  and  third  rows,  and  between  each  horizontal  pair  of  the  second  row.  Some  of 
these  edges  are  clearly  in  error.  It  turns  out  that  the  edges  shown  between  the  vertical  pairs 
of  the  first  column  are  meaningless  since  they  are  caused  by  a  bad  connection  at  pin  9  of 
the  ‘C40  bus  interface  which  receives  these  signals  (see  the  die  photograph  of  the  test  chip 
in  Figure  13-2).  Unfortunately,  this  problem,  which  could  be  repaired  only  by  rebuilding 
the  ‘C40  daughter  board,  was  not  discovered  until  relatively  late  in  the  testing  phase  when 
there  was  not  enough  time  left  to  remake  the  board. 

With  the  exception  of  the  edge  between  the  vertical  pair  at  the  top  of  the  last  column, 
which  can  be  explained  only  by  an  extreme  offset  in  the  absolute- value-of-difference  circuit 
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Figure  13-23:  Test  image  used  on  32x32  processor  and  corresponding  edges. 


at  that  location,  the  other  edges  found  appear  to  be  plausible  as  they  occur  around  the 
smeared  impulse  input.  After  one  smoothing  cycle  the  edge  between  the  horizontal  pair  at 
the  center  of  the  second  row  is  removed,  and  after  two  cycles,  all  of  the  edges  except  the 
vertical  edges  between  the  first  and  second  rows  and  the  meaningless  edges  of  column  1 
are  removed.  The  fact  that  the  edges  between  the  top  two  rows  persist  may  be  partially 
explained  by  the  fact  that  the  differences  between  rows  1  and  2  are  not  smoothed  away  as 
strongly  as  those  between  rows  2  and  3  due  to  the  effect  of  the  array  boundary.  Another 
more  likely  explanation,  however,  is  the  variation  in  edge  threshold  values  between  the 
different  AVD  circuits. 

Edge  detection  on  the  32x32  processor  was  also  tested  by  supplying  a  sample  input 
image  and  recording  the  edge  outputs.  The  input  image  used  in  one  test,  which  is  shown 
at  the  left  of  Figure  13-23,  was  sampled  down  to  32x32  pixels  from  an  original  256x256 
image.  Edges  are  displayed  by  coloring  one  of  the  adjacent  image  pixels  in  different  shades 
of  gray.  An  image  pixel  which  has  a  vertical  edge  directly  above  it  is  colored  light  gray, 
while  a  pixel  with  a  horizontal  edge  on  its  left  side  is  colored  dark  gray.  If  edges  exist  both 
above  and  to  the  left  of  an  image  pixel,  it  is  colored  black. 

The  results,  shown  on  the  right  side  of  Figure  13-23,  are  not  simple  to  interpret  given 
that  we  do  not  know  the  actual  signal  levels  stored  in  the  array.  With  the  32x32  array,  the 
floating-gate  outputs  of  only  the  top  16  rows  were  brought  to  output  pads  as  this  was  the 
maximum  number  of  A/D  channels  available  in  the  test  system.  Even  if  we  did  have  the 
outputs  from  all  32  rows,  however,  we  would  still  not  have  an  accurate  representation  of  the 
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internal  signal  levels  since  they  would  be  further  distorted  as  they  were  transferred  to  the 
output  devices.  Nonetheless,  one  can  discern  sonae  general  outlines,  such  as  the  edges  found 
around  the  face  area  and  near  the  shoulders  and  neck.  The  vertical  black  line  towards  the 
righthand  side  of  the  edge  image,  on  the  other  hand,  is  due  to  the  previously  discussed  bad 
connection  at  pin  9  of  the  ‘C40  daughter  board. 

13.6  Recommendations  for  Improving  the  Design 

Several  problems  were  uncovered  in  testing  the  circuits  used  in  the  MSV  processors 
which  prevented  all  of  the  design  goals  from  being  met.  Some  of  these  problems  had  a 
trivial  solution,  such  as  redesigning  the  test  system  to  provide  a  separate  rail  voltage  to  the 
differential  amplifiers,  while  others,  such  as  the  poor  charge  transfer  efficiency  of  the  CCDs, 
could  not  be  solved  without  changing  the  fabrication  process.  Nonetheless,  it  is  clear  from 
the  overall  results  of  the  individual  test  circuits  and  the  edge  detection  and  smoothing  tests 
on  the  arrays  that  given  the  proper  resources,  a  processor  can  be  built  which  does  meet  the 
design  goals  specified  for  the  system.  In  this  sense,  the  results  of  this  research  are  positive. 

After  thoroughly  studying  the  advantages  and  limitations  of  the  current  design,  however, 
several  changes  to  the  array  architecture  which  would  greatly  improve  its  effectiveness  in 
the  motion  estimation  system  are  now  apparent.  One  problem  with  the  present  design  is 
that  the  unit  cell,  which  measures  224^im  X  224/rm,  is  too  large.  Even  if  scaled  down  by  a 
factor  of  4  to  56yum  X  56/rm,  one  could  at  best  build  a  160x160  array  on  a  1cm  die,  while 
for  reliable  motion  estimation,  the  minimum  array  size  needed  is  closer  to  256x256. 

The  unit  cell  could  be  greatly  reduced  by  removing  most  of  the  edge  detection  circuits 
and  bringing  them  outside  the  array,  as  shown  in  Figure  13-24,  where  they  can  be  shared 
by  all  cells  in  one  row  or  column.  The  only  circuits  absolutely  needed  at  each  pixel  are  the 
floating-gate  amplifier  for  sensing  signal  levels  and  the  latches  for  storing  the  vertical  and 
horizontal  edge  charges.  The  new  unit  cell  structure,  illustrated  in  Figure  13-25,  could  be  as 
much  as  a  factor  of  two  smaller  than  the  current  design.  Furthermore,  bringing  the  absolute- 
value-of- difference  circuits  outside  the  array  increases  the  flexibility  for  further  improving 
their  design  for  better  matching  and  higher  resolution,  as  the  constraints  on  circuit  area  are 
no  longer  as  severe. 

Changing  from  pixel-parallel  to  row-  and  column-parallel  processing  will  of  course  in¬ 
crease  the  total  time  needed  for  edge  detection.  One  advantage  to  the  reduced  structure. 
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Figure  13-24:  Proposed  architecture  for  focal-plane  processor 
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Figure  13-25:  Unit  cell  structure  in  focal-plane  processor. 
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however,  is  that  it  should  be  possible  to  use  this  array  for  imaging  as  well  as  for  edge  detec¬ 
tion  since,  unlike  the  present  design,  it  does  not  require  a  large  fraction  of  the  total  pixel 
area  to  be  allocated  to  n- wells  and  diffusions  held  at  Vdd  which  can  both  trap  and  sink 
light-generated  charge.  The  current  array  structure,  on  the  other  hand,  is  not  as  well-suited 
for  imaging;  and  if  it  were  to  be  used  in  the  motion  estimation  system,  not  only  would  a 
secondary  imaging  device  be  needed,  but  the  time  required  to  load  images  into  the  array 
would  also  have  to  be  taken  into  consideration.  The  suggested  design  improvement  would 
thus  not  significantly  affect  overall  processing  time,  and  would  in  fact  reduce  the  complexity 
of  the  system  by  removing  the  need  for  the  additional  sensor. 

Making  the  suggested  changes,  it  should  be  possible  to  build  a  256x256  focal-plane  MSV 
processor  on  a  single  chip  using  a  O.Syum,  or  smaller,  CCD-CMOS  process.  In  Chapter  17, 
we  will  examine  how  the  complete  real-time  motion  estimation  system  could  be  assembled 
with  this  chip,  along  with  the  matching  processors  presented  in  Part  III. 


Part  III 

A  Mixed  Analog/Digital  Edge 

Correlator 
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Chapter  14 


Design  Specifications 


It  is  useful  to  recall  the  basic  steps  of  the  matching  procedure,  presented  in  Chapter  6, 
which  are  performed  on  each  M  x  M  block  in  the  base  edge  map.  The  same  notation  is 
used  as  before  where  b  denotes  the  1-bit  value  of  an  individual  pixel  in  the  block  being 
matched,  and  s  denotes  the  value  of  an  individual  pixel  in  the  search  window.  In  addition, 
we  let  P  =  represent  the  total  number  of  pixels  in  the  block,  and  define  Vij  as  the  sum 
of  absolute  values  of  difference  computed  at  position  (i,  j)  in  the  search  window.  The  steps 
of  the  matching  procedure  are  summarized  as  follows: 

1.  Count  the  number  of  edge  pixels  in  the  block  from  the  base  edge  map  to  find 

2.  Compute  ai||i?||  and  test  that 


F,  <ail|il||  (14.1) 

where 

Vh  =  ai^,  and,  F  =  Q'2P  (14-2) 

and  01,02  are  constants  chosen  to  allow  acceptable  detection  and  false-alarm  rates, 
satisfying 

1  >  oi  >  202  >  0  (14.3) 

as  described  in  Section  6.1. 

3.  If  oi  11511  is  outside  these  bounds,  stop.  The  block  cannot  produce  an  acceptable 
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match.  Otherwise,  for  each  position  of  the  search  window: 

(a)  Compute  for  each  pixel  in  the  block,  the  binary  function:  bs  +  bs  and  sum  these 
values  over  the  entire  block  to  find  the  score 

PVij  =  Y,bs  +  bs  (14.4) 

(b)  IfPV;-,  <ai||i?|| 

i.  Store  the  current  value  of  {i,j)  as  a  candidate  match  position. 

ii.  If  Vij  <  Vmin,  the  current  minimum  score,  set  Vmin  =  I'ij  and  store  the 
current  value  of  (i,j)  as  the  best  position. 

4.  After  the  score  has  been  computed  at  every  position  of  the  search  window,  if  at  least 
one  candidate  match  has  been  found,  compute  A.-r  and  Ay,  the  maximum  spread  in  the 
X  and  y  coordinates  of  the  candidates,  and  test  if  both  Aa;  <  d^ax  and  Ay  <  dmax, 
where  dmax  is  the  maximum  possible  spread  for  considering  that  the  minimum  is 
unique  and  well  localized.  If  the  results  of  both  tests  are  true,  signal  that  the  match 
at  the  position  of  Vmin  is  acceptable. 

In  order  for  the  matching  circuit  to  produce  useful  data  for  computing  motion,  three 
primary  constraints  must  be  satisfied.  The  first  of  these  is  that  the  block  size,  P,  must 
be  large  enough  to  ensure  that  we  can  find  constants  oi  and  a2,  as  defined  in  step  2 
above,  to  give  adequate  detection  and  false-alarm  rates.  The  second  constraint  is  that  the 
search  window  must  be  large  enough  to  account  for  the  maximum  displacements  in  the 
image  plane  caused  by  the  motion.  Finally,  the  circuit  must  be  designed  to  compute  the 
quantities  a'i||P||  and  PV)j  with  sufficient  precision  to  accurately  perform  the  validation 
tests  and  find  the  minimum  score. 

These  constraints,  which  will  be  examined  more  closely  in  the  following  sections,  deter¬ 
mine  the  minimum  design  specifications  for  the  matching  processor.  Of  course,  the  best 
design,  given  the  requirements  of  the  motion  system,  will  be  the  one  that  not  only  meets 
these  specifications,  but  also  consumes  the  least  power  and  silicon  area.  In  the  next  two 
chapters,  we  will  look  at  several  different  implementations  of  the  matching  procedure  to 
find  the  one  which  is  best  according  to  all  of  these  criteria. 
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14.1  Required  Block  Size 

We  can  find  the  minimum  required  block  size  from  equations  (6.16)  and  (6.17)  for  the 
mean  and  variance  of  Vij  under  the  hypothesis,  Hq,  that  the  match  is  false. 


(14.5) 


_2  Ps(^-Ps) 

^Hom\  “  P 


(14.6) 


where  Ps  is  the  probability  that  an  individual  pixel  in  the  search  window  will  be  an  edge 
pixel.  Since  ||il||/P  <  1/2,  due  to  the  validation  test  (14.1),  and  ps  >  0,  we  know  that 


PHo,\\B\\  ^ 

Given  that  P  =  we  also  have  the  following  bound  on  the  variance 


(14.7) 


(14.8) 


with  equality  being  achieved  only  for  ps  =  1/2. 

In  order  to  give  a  low  false-alarm  rate,  the  threshold,  r,  for  deciding  to  consider  a  match 
as  a  candidate  is  chosen  such  that 


•^Ho.lisil  <\t-  PHo,\\B\\\ 


(14.9) 


Let  n  be  the  smallest  number  such  that 


MHo,||B||I  >  naHol\B\\ 


(14.10) 


for  all  permissible  values  of  \\B\\.  From  the  validation  test  (14.1),  we  have 

H>^ 


P  Oi 


(14.11) 


while  the  cutoff  threshold  is  given  by  r  =  q:i||5||/P.  Combining  these  equations  with  the 
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inequality  (14.7)  gives 

>  ^(1-ai)  (14.12) 

We  can  thus  ensure  that  the  inequality  (14.10)  is  satisfied  for  some  suitably  large  value  of 
n  by  choosing  M  such  that,  given  aq  and  025 


—  (l-ai)> 


n 

2M 


(14.13) 


or, 


M  > 


n 

202/01(1  -  01) 


(14.14) 


In  using  these  bounds  to  determine  M,  we  need  to  set  a\  as  large  as  possible  and 
02/01  as  small  as  possible  so  that  a  sufficiently  large  number  of  blocks  from  the  base  edge 
map  will  pass  the  validation  test  (14.1).  Otherwise,  it  may  not  be  possible  to  find  enough 
correspondence  points  to  obtain  good  motion  estimates. 

The  values  of  oq  and  02/oq  most  used  in  testing  the  matching  procedure  on  real  images 
were  0.5  and  0.15,  respectively.  For  the  tests  presented  in  Chapter  6,  M  was  set  equal  to  24 
pixels,  giving  n  >  3.6.  As  can  be  judged  from  the  quality  of  the  motion  estimates  listed  in 
Chapter  7,  this  value  was  adequate  for  achieving  a.  low  error  rate.  Based  on  these  results, 
we  will  thus  require  that  the  matching  circuit  be  able  to  accomodate  block  sizes  of  at  least 


24x24. 


14.2  Required  Search  Area 

In  Section  2.3,  image  plane  displacements  were  calculated,  assuming  a  focal  length 
of  /  =  200  pixels,  for  three  special  cases:  pure  translation  along  the  x  direction,  pure 
translation  along  the  f  direction,  and  pure  rotation  of  9  about  Q  =  y.  At  a  frame  rate  of 
1/30  sec,  the  maximum  absolute  displacements  were  found  to  be 

1.  For  pure  translation  along  x:  88.8/Z  pixels,  where  Z  is  the  distance  in  meters  to  the 
object  being  viewed. 
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2.  For  pure  translation  along  f :  (x,  y)/300  pixels  where  (x,  y)  are  image  plane  coordinates 
measured  from  the  principal  point,  and 

3.  For  pure  rotation  of  6  about  Q  —  y:  3.5  pixels  per  degree  of  rotation,  at  the  center  of 
the  image. 

These  motions  are  quite  typical  of  those  that  could  be  encountered  in  actual  imaging 
situations.  With  /  =  200  pixels,  a  sensor  size  of  256x256  pixels  corresponds  approximatel,v 
to  a  field  of  view  of  32.6°,  as  measured  from  the  z  axis,  which  is  close  to  the  largest  that 
can  be  obtained  from  ordinary  lenses  without  significant  distortion. 

As  can  be  seen  from  a  few  rough  calculations,  the  largest  displacements  are  caused  by 
rotation  about  an  axis  parallel  to  the  image  plane.  For  example,  with  5°  of  rotataion  about 
y,  the  smallest  offset  is  17.5  pixels  at  the  center  of  the  image.  Combined  with  a  translation 
along  the  displacement  could  easily  exceed  30  pixels  if  there  are  objects  in  the  scene 
closer  than  10  meters.  Even  if  the  motion  is  primarily  a.  translation  along  the  z  axis,  as 
would  be  the  case  for  a  camera  mounted  on  the  front  of  a  car,  any  small  rotation  caused  by 
vibrations  or  by  turning  can  result  in  large  offsets. 

We  must  thus  plan  for  relatively  large  search  windows  of  anywhere  from  30x30  to 
200x200  pixels^.  Of  course,  increasing  the  pixel  size  would  reduce  the  magnitude  of  the 
displacements  in  number  of  pixels.  However,  it  would  also  increase  the  actual  image  area 
covered  by  a  single  block  since  the  required  number  of  pixels  per  block  would  not  change. 
As  the  area  covered  by  the  blocks  becomes  larger  with  respect  to  the  total  sensor  area,  not 
only  can  fewer  blocks  containing  different  features  be  extracted  from  the  image,  but  the 
error  in  assigning  the  correspondence  point  to  the  center  of  the  best  matching  block  in  the 
second  image  will  increase. 

In  addition  to  requiring  large  search  windows,  we  should  also  require  the  size  of  the 
window  to  be  adjustable  according  to  the  type  of  motion  which  is  expected.  For  example, 
if  the  motion  is  primarily  in  the  x  direction,  or  if  the  y  axis  is  the  primary  axis  of  rotation, 
as  in  the  case  of  the  turning  car,  the  image  plane  displacements  will  mostly  be  in  the  x 
direction,  and  hence  there  is  no  point  in  wasting  time  searching  over  large  y  offsets. 

It  should  be  noted  that  the  search  area  requirements  for  the  matching  circuits  to  be  used 
in  the  motion  system  are  very  different  from  those  of  the  motion  estimation  chips  typically 

^The  window  sizes  for  the  astronaut  and  lab  sequences  shown  in  Part  I  were  120x120  and  200x60, 
respectively. 
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used  in  video  applications  for  camera,  stabilization  or  image  sequence  compression.  In  these 
applications,  the  camera  is  mostly  stationary  while  the  motion  in  the  scene  is  caused  by  the 
people  or  objects  being  filmed.  The  differences  in  successive  frames  in  these  situations  are 
usually  small,  and  it  is  commonly  assumed  that  maximum  displacements  are  on  the  order 
of  ±8  pixels. 

14.3  Precision 

There  are  three  tests  performed  to  validate  a  candidate  match.  The  first  two  test  the 
eligibility  of  the  entire  block  by  verifying  that  the  total  number  of  edge  pixels  is  within  the 
acceptable  upper  and  lower  bounds.  Combining  equations  (14.1)  and  (14.2),  we  have 

«2P<«i||5||  <oi|  (14.15) 

The  above  expression  is  written  in  a  manner  to  emphasize  that  the  known  quantities  V)  = 
a2P  and  14  =  aiP/2  should  be  premultiplied  and  fed  directly  into  the  matching  circuit, 
rather  than  be  computed  on-chip.  Only  the  value  of  afi||5||  needs  to  be  computed  by  the 
circuit,  and  this  can  be  done  as  the  block  is  being  read  in.  The  precision  required  for 
representing  Ofi||il||  depends  on  whether  the  comparison  is  performed  digitally  or  in  analog. 
In  an  analog  circuit,  we  only  need  to  ensure  that  the  difference  14  —  Vt  is  large  enough  to 
discriminate  a  sufficient  number  of  different  levels  in  the  value  of  ai||i?||.  In  a  digital  circuit, 
if  and  02  ^^re  negative  powers  of  2,  the  operations  required  to  perform  the  comparisons 
are  trivial.  If  more  precision  is  required,  however,  floating-point  arithmetic  must  be  used. 

The  closest  powers  of  2  to  the  values  of  0.5  and  0.15  used  in  simulating  the  matching 
procedure  on  the  test  image  sequences  are  oi  =  1/2  and  0.2! o.\  =  1/8.  These  numbers  may 
be  adequate  for  many  images,  however,  the  lower  value  for  02/01  will  increase  the  false- 
alarm  rate.  From  equation  (14.13)  with  M  =  24,  the  values  oi  =  0.5  and  02/01  =  0.125 
give  n  >  3,  as  opposed  to  n  >  3.6  with  02/01  =  0.15.  Furthermore,  there  is  not  much 
flexibility  for  tuning  if  oi  and  02  are  restricted  to  negative  powers  of  2.  Not  implementing 
the  multiply  and  compare  operations  with  some  form  of  extended  precision  arithmetic  will 
thus  reduce  the  robustness  of  the  system. 
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The  third  validation  test  which  is  performed  is  to  compare 

Pl-,<ai\\B\\  (14.16) 

at  each  position  of  the  search  window.  The  value  of  PVij  can  be  anywhere  from  0  to 
P,  although,  since  a'i||P||  <  aiP/2,  we  only  need  to  represent  values  up  to  aiP/2.  With 
ai  =  0.5  and  P  =  576,  aiP/2  =  144,  which  requires  8  bits  to  represent  digitally.  Precision 
issues  with  the  representation  of  PVij,  as  well  as  OiUPH,  are  thus  a  concern  primarily  for 
analog  implementations.  It  is  certainly  not  necessary  to  require  the  circuit  to  discriminate 
a  full  144  levels,  however,  we  do  need  to  ensure  that  the  threshold  test  (14.16)  is  accurate  to 
at  least  a  fraction  of  ||b||,  and  we  also  need  to  ensure  that  the  circuit  can  discriminate 
between  different  candidate  minimum  values  of  PVij. 

Any  difference  between  scores  that  is  within  one  standard  deviation  of  the  expected 
minimum  value  cannot  be  considered  significant.  From  equation  (6.14),  the  mean  and 


variance  of  PVij  under  the  hypothesis  Hi  that  the  match  is  correct  is  given  by 

E[PVij\Hi]  =  Ppn  (14.17) 

V^iiPVi,\Hi)  =  Ppnil-Pn)  (14.18) 

where  is  the  probability  of  an  edge  pixel  being  turned  on  or  off  by  noise.  Suppose 
=  0.05,  with  M  =  24  we  have 

=  i/\MPV,\Hi)  =  My/pn{l-Pn)  =  5.23  (14.19) 


In  order  to  have  <  1,  it  would  also  be  necessary  to  have  <  .00174,  which  is 

much  lower  than  can  be  reasonably  expected.  Being  able  to  discriminate  between  scores 
that  are  within  3  or  4  votes  of  each  other  should  thus  be  sufficient.  From  inequality  (14.8), 
we  also  see  that 

^  ^  (14-20) 

and  hence  requiring  that  the  circuit  be  able  to  discriminate  more  than  48  different  values  of 
Qi||Pl|  will  ensure  that  the  threshold  test  (14.16)  can  be  performed  to  an  accuracy  greater 
than  (7Ho,\\b\\j 
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Case  Study:  A  Purely  Digital  Design 


The  matching  procedure  can  be  implemented  by  two  very  different  architectures,  both 
involving  fully  parallel  array  processing.  In  the  first  method,  pixels  from  the  block  being 
matched  are  stored  at  the  nodes  of  the  array  while  the  pixels  from  the  search  window  are 
shifted  across  it.  A  score  is  computed  at  each  shift  cycle  and  compared  with  the  current 
minimum  value.  If  it  is  smaller,  the  minimum  value  is  updated  and  the  position  of  the 
search  window  is  recorded.  Once  the  entire  search  window  has  been  processed,  if  all  of  the 
validation  tests  have  been  passed,  the  offset  corresponding  to  the  minimum  score  is  reported 
as  the  position  of  the  best  match. 

In  the  second  architecture,  which  has  been  used  in  some  commercially  available  motion 
estimation  chips^ ,  the  entire  search  window  is  stored  in  a  processor  array  where  each  node 
corresponds  to  a  given  offset.  Each  pixel  from  the  base  image  block  is  broadcast  to  the 
entire  array  so  that  its  difference  with  every  pixel  of  the  search  window  can  be  computed 
simultaneously  and  the  results  added  to  the  current  scores  stored  at  each  node.  The  pixels 
of  the  search  window  are  then  shifted  so  that  their  offset  relative  to  the  next  base  pixel  to 
be  processed  corresponds  to  the  offset  assigned  to  their  new  array  location.  Once  all  the 
pixels  from  the  base  image  block  have  been  processed,  the  validation  tests  are  performed 
and  the  scores  stored  at  each  node  are  compared  to  find  the  one  with  the  minimum  value. 

In  the  next  two  sections,  I  will  discuss  how  the  matching  circuit  could  be  designed,  given 
the  constraints  presented  in  the  last  chapter,  using  each  of  these  architectures. 

^For  example  the  STI3220  motion  estimation  processor  from  SGS  Thomson,  which  was  briefly  discnssed 
in  Chapter  3 
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15.1  First  Method:  Moving  the  Search  Window 

The  structure  of  the  AL  X  M  processing  array  needed  for  the  first  architecture  is  shown 
in  Figure  15-1  with  the  block  diagram  of  the  individual  processing  cells  given  in  Figure  15-2. 
Each  bit  from  the  base  image  block  is  loaded  and  stored  in  a  latch  for  the  duration  of  the 
search.  During  the  load  phase,  the  edge  pixels  are  counted  and  the  value  of  ai||i?||,  as 
defined  in  the  preceding  chapter,  is  computed  and  stored  in  a  register  so  that  it  can  be  used 
for  the  threshold  test  performed  on  each  score.  The  validation  test  comparing  ai||i?||  to 
the  external  inputs  Vh  =  aiT’/2  and  Vi  =  02^  can  be  performed  once  the  entire  block  from 
the  base  image  is  loaded  to  determine  if  it  is  necessary  to  start  the  the  search  procedure. 
Scoring  begins  as  soon  as  the  block  corresponding  to  the  first  offset  in  the  search  window 
is  moved  into  position  by  means  of  the  shift  register  cells  located  at  each  processing  node. 

The  complexity  of  this  design  is  not  in  computing  the  score  at  each  pixel,  which  is  a 
simple  XOR  operation,  but  in  tallying  the  scores  from  every  node  in  the  array.  Counting 
the  scores  must  be  performed  as  quickly  as  possible  as  there  are  many  offsets  in  the  search 
window  and  many  blocks  in  the  base  image  to  match.  Unless  prevented  by  space  restrictions 
on  the  chip,  the  tally  function  should  thus  be  implemented  as  a  single  combinational  logic 
circuit  to  avoid  wasting  cycles.  Furthermore,  even  though  we  only  need  to  represent  numbers 
up  to  the  value  of  Q'i||5||,  all  of  the  votes  must  counted,  as  we  cannot  know  in  advance 
which  nodes  will  have  a  ‘high’  output  and  which  will  not. 

Building  a  tally  circuit  to  count  votes  from  nodes  is  expensive  both  in  area  and  in 
delay.  Figure  15-3  shows  the  construction  procedure  for  building  an  2”  —  1  vote  counter 
with  an  ra-bit  output.  A  full  adder  can  tally  the  three  votes  from  Pq,  Pi,  and  P2  giving  the 
2-bit  output  {tUo},  where 


fo  —  P0P1P2  +  P0P1P2  +  P0P1P2  +  P0P1P2  (15-1) 

and 

ti  =  PqPi  +  P0P2  +  P1P2  (15.2) 

A  7-vote  tallier  can  be  constructed  by  adding  the  results  from  two  3-vote  talliers  and 
connecting  the  seventh  node  to  the  carry-in  input  of  the  2-bit  adder.  Generalizing  this 
procedure,  it  is  easily  seen  that  a  2”  —  1  vote  counter  can  be  built  from  two  2"~^  —  1  talliers 
and  one  (n  —  l)-bit  adder,  as  shown  in  Figure  15-3c. 
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Figure  15-1:  Processing  array  with  base  image  block  held  in  fixed  position. 


Figure  15-2:  Individual  processing  cell. 
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c.)  (2"  —  l)-vote,  n-bit  tallier 


Figure  15-3:  Tally  circuit  construction. 
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Let  An  denote  the  number  of  full  adders  required  to  implement  a  2"^  -  1  vote  counter. 
Clearly,  An  satisfies  the  recursive  relation 

An  =  2d„_i  +  n  —  1  (15.3) 

and  since  Ai  =  0,  it  is  easily  verified  that  the  solution  to  this  recursion  is  given  by 

=  2”  -  n  -  1  (15.4) 

The  full  2”  —  1  input  tally  circuit  can  be  visualized  as  a  tree  with  n  —  1  levels  having 
{n  -  f  -  1)  2'  f-bit  adders  at  each  level.  The  total  delay  in  the  circuit  is  thus  the  sum  of  the 
worst  case  delays  in  the  n-1  levels.  Two  choices  with  different  worst  case  delays  are  possible 
for  implementing  the  Tbit  adders.  The  first  is  with  ripple-carry,  in  which  the  carry  bits 
propagate  sequentially  through  the  i  full  adders,  while  the  other  is  with  carry-lookahead, 
in  which  the  carry  bits  are  generated  in  parallel  with  combinational  logic.  If  ripple-carry 
adders  are  used,  the  worst  case  delay  for  an  Tbit  adder  is  id,  where  d  is  the  delay  for  a 
single  full  adder.  The  maximum  delay  for  the  full  tally  circuit  would  then  be 

(15.5) 

i=i 

which  increases  as  n^.  If  carry-lookahead  adders  are  used,  the  delay  can  be  reduced  to  0{n), 
but  at  the  cost  of  more  complexity  in  the  adder  circuits  [102]. 

Eciuation  (15.4)  is  most  useful  when  the  number  of  votes  to  count  is  one  less  than  a 
power  of  2.  In  the  last  chapter,  it  was  determined  that  the  minimum  block  size  should  be 
24x24,  meaning  that  576  votes  need  to  be  counted  with  10  output  bits  to  represent  the 
answer.  Applying  equation  (15.4)  with  n  =  10,  we  would  need  1013  full  adders  to  implement 
this  circuit,  which  is  clearly  more  than  are  actually  necessary.  The  most  efficient  design, 
in  terms  of  both  area  and  interconnect  requirements,  is  to  build  one  24- vote  tallier  per  row 
and  to  sum  the  outputs  from  each  row  with  an  adder  tree,  as  shown  in  the  block  diagram 
of  Figure  15-1.  We  can  build  the  row  tallier  with  one  15- vote  counter,  requiring  11  adders; 
one  7-vote  counter,  requiring  4  adders;  and  one  adder  to  count  the  remaining  two  nodes. 
One  3-bit  adder  and  one  4-bit  adder  are  then  needed  to  combine  the  results  for  the  entire 
row,  for  a  total  of  23  l-bit  full  adders.  Summing  the  results  from  all  24  rows  requires  12 


CHAPTER  15.  CASE  STUDY:  A  PURELY  DIGITAL  DESIGN 


253 


5-bit  adders,  6  6-bit  adders,  3  7-bit  adders,  and  2  8-bit  adders,  giving  a  total  of  133  1-bit 
adders.  The  entire  circuit  thus  consists  of  24  X  23+  133  =  685  l-bit  adders. 

One  possible  layout  for  a  single  node  of  the  array  is  shown  in  Figure  15-4,  with  the 
equivalent  circuit  diagram  in  Figure  15-5.  The  top  half  of  the  layout  contains  the  latch  which 
stores  the  bit  b,  the  shift  register  where  the  bit  from  the  search  window  is  temporarily  stored, 
and  the  XOR  circuit  which  computes  the  score  for  the  node.  The  bottom  half  contaiirs  a 
1-bit  full  adder  and  is  laid  out  so  that  its  width  matches  that  of  the  top  half  of  the  cell  as 
closely  as  possible  so  as  to  minimize  wasted  space.  A  single  row  of  the  processor  array,  along 
with  its  row  tallier,  can  thus  be  formed  by  abutting  24  of  the  cells  shown  in  the  diagram, 
with  one  missing  the  adder  section.  To  implement  the  tally  function,  an  additional  twelve 
horizontal  lines  of  metal  interconnect  must  be  placed  beneath  the  cell  so  that  the  inputs 
and  outputs  of  the  adder  circuits  can  be  properly  wired.  As  shown  in  the  diagram,  the  cell 
layout  measures  340A(h)  x  260A(w).  With  the  additional  metal  lines,  it  will  measure  at 
least  412A(h)  x  260A(w). 

The  control  lines  carrying  the  clock  signals  for  phasing  the  shift  register  and  latching 
the  bits  from  the  base  image  block  run  vertically  across  the  cell  so  that  rows  can  be  abutted 
to  form  the  full  array.  The  complete  24x24  array  will  thus  measure  at  least  9888A(h)  x 
6240A(w).  In  addition  to  the  area  taken  by  the  array,  the  summation  circuit  to  add  the 
results  from  aU  of  the  rows,  wiU  require  a  minimum  of  133  times  the  area  of  a  single  full 
adder,  which  as  shown,  measures  134A(h)  X  196A(w). 

It  should  be  noted  that  the  tally  circuit  just  described  uses  ripple  carry,  and  will  thus 
have  worst-case  propagation  delays  proportional  to  the  square  of  the  number  of  levels  in 
the  summation  tree,  which  in  this  case  is  hve.  The  circuit  can  be  made  to  run  faster  using 
carry-lookahead  adders.  However,  these  require  more  area,  and  because  their  structure  is 
not  as  regular  as  that  of  the  ripple  carry  adder,  it  would  not  be  as  simple  to  construct  a 
unit  cell  for  the  array  such  as  the  one  in  Figure  15-4. 

15.2  Second  Method:  One  Processor  Per  Offset 

The  second  architecture  for  implementing  the  matching  circuit  is  interesting  both  be¬ 
cause  it  can  operate  much  faster  than  the  first  method,  and  because  it  has  been  used  in  the 
design  of  some  existing  motion  estimation  chips  currently  on  the  market. 

The  basic  idea  of  this  design  is  illustrated  in  the  block  diagram  of  Figure  15-6.  The  array 
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Figure  15-4:  Layout  of  a  unit  cell  including  one  full  adder  circuit 
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b-latch 


Figure  15-5:  Circuit  diagram  for  the  layout  of  Figure  15-4. 
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b-bus 


Figure  15-6:  Offset  processor  array. 


consists  of  a  regular  arrangement  of  processors,  each  corresponding  to  a  given  offset  in  the 
search  -window.  Each  processor  consists  of  an  XOR  circuit  to  compute  the  score  for  one 
pixel  in  the  base  image  block,  a  counter /accumulator  to  compute  the  total  score,  and  a  shift 
register  cell  to  temporarily  store  one  pixel  from  the  search  window  as  shown  in  Figure  15-7. 
The  array  is  initialized  by  reseting  all  of  the  accumulators  and  loading  the  entire  search 
window  into  the  column  store  blocks  placed  between  the  columns  of  the  processor  array.  In 
each  processing  cycle,  one  pixel  from  the  base  image  block  is  broadcast  on  a  global  bus  and 
its  difference  is  computed  simultaneously  with  every  pixel  from  the  search  window. 

The  base  image  pixels  are  sequenced  by  column,  such  that  pixels  (0,  i)  through  (M -l,i) 
from  column  i  are  processed  in  order,  followed  by  pixel  (0,  *  -|-  1)  from  the  top  of  the  next 
column.  At  the  beginning  of  each  column,  the  contents  of  the  column  stores,  which  each 
hold  one  column  from  the  search  window,  are  copied  into  the  vertical  shift  registers  linking 
the  processor  nodes.  After  each  cycle,  the  search  window  pixels  are  shifted  up  one  row  so 
that  their  offset  relative  to  the  incoming  base  pixel  corresponds  to  the  offset  assigned  to  their 
new  location.  Once  the  last  pixel  in  the  column  from  the  base  image  has  been  processed,  the 
contents  of  the  column  store  blocks  are  shifted  horizontally,  and  the  procedure  is  repeated 
until  the  entire  block  has  been  processed.  Assuming  the  processor  array  contains  Wi  X  W2 
nodes,  the  minimum  score  and  its  offset  can  be  found,  once  processing  is  completed,  using 


Figure  15-7:  Block  diagram  of  one  processor  node  with  8-bit  counter /accumulator  cell. 
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Figure  15-8:  Layout  of  8-bit  counter  and  accumulator  with  inhibit  on  overflow 
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a  comparator  tree  with  at  most  2”  -  1  comparators,  where  n  =  [log2(W'^iW2)l  • 

This  design  is  clearly  much  faster  than  the  one  discussed  in  the  preceding  section  as 
the  number  of  cycles  needed  to  find  the  best  offset  is  equal  to  the  number  of  pixels  in  the 
base  image  block,  instead  of  the  much  larger  number  of  offsets  within  the  search  window. 
The  price  paid  for  this  speed,  however,  is  silicon  area.  As  discussed  in  the  last  chapter, 
the  accumulators  at  each  node  need  to  represent  scores  with  8  bits  of  precision,  and  the 
search  window  needs  to  be  large  enough  to  accomodate  the  typical  displacements  caused  by 
the  camera  motion.  It  was  estimated  that  the  required  size  could  be  between  30x30  and 
200x200  pixels. 

A  possible  layout  for  an  8-bit  counter  and  accumulator  which  could  be  used  in  the  design 
is  shown  in  Figure  15-8  and  corresponds  to  the  section  of  the  block  diagram  in  Figure  15-7 
below  the  XOR  circuit.  This  cell  measures  647A(h)  x  387A(w)  and  includes  a  circuit  to 
inhibit  counting  if  the  carry-out  bit  of  the  accumulator  goes  high,  signalling  an  overflow. 
In  this  design,  the  8-bit  counter  is  implemented  with  two  d-bit  counters,  each  containing 
four  1-bit  half- adders  with  carry-lookahead.  Some  area  can  be  saved  by  removing  the  carry 
propagation  circuits — which  can  be  seen  in  the  layout  as  the  rightmost  elements  of  the 
1-bit  subcells — and  connecting  the  carry-out  bit  directly  to  the  input  of  the  next  cell.  This 
would  decrease  the  width  of  the  cell  by  86 A,  but  will,  of  course,  also  reduce  its  operating 
speed.  Based  on  the  size  of  the  counter /accumulator  alone,  ignoring  the  space  needed  by 
the  column  store  and  shift  register  subcells,  we  can  see  that  a  minimum  size  30x30  array 
would  require  at  least  19410A(h)  X  11610A(w).  Using  a  0.8^m  (A  =  0.4^m)  or  smaller 
process,  we  could  conceivably  build  one  30x30  processor  array  on  a  single  1cm  die. 


Chapter  16 


Mixing  Analog  and  Digital 


Of  the  two  architectures  discussed  in  the  last  chapter  for  a  purely  digital  design,  only  the 
first  one  readily  lends  itself  to  analog  processing.  In  the  second  architecture,  in  which  the 
scores  for  each  offset  in  the  search  window  are  computed  simultaneously  and  accumulated  as 
the  base  image  block  is  read  in,  the  principal  processing  element  is  the  counter/ accumulator. 
If  designed  with  analog  circuits,  each  node  would  need  to  include  the  following  functions: 

1.  Scaling,  to  convert  the  binary  output  of  the  XOR  circuit  to  a  usably  small  unit  voltage 
or  current, 

2.  Addition  of  the  new  input  to  the  previous  value  of  the  score,  and 

3.  Storage  of  the  result. 

Since  any  mechanism  to  ‘store’  current  also  requires  storing  a  control  voltage,  it  is  sim¬ 
pler  to  build  an  analog  counter/accumulator  which  would  operate  entirely  in  the  voltage 
domain.  Adding  voltages  would  require  an  opamp  circuit  with  matched  resistive  elements, 
as  well  as  low  ouput  impedance  buffers  to  make  the  inputs  appear  as  ideal  voltage  sources. 
Furthermore,  care  would  need  to  be  taken  to  ensure  that  the  individual  processors  are 
matched  to  better  than  ±3%  of  their  full  scale  range  in  order  to  meet  the  precision  re¬ 
quirements  outlined  in  Section  14.3.  Even  if  it  is  possible  to  design  the  processors  to  these 
specifications,  they  will  still  require  substantial  silicon  area,  and  will  certainly  be  more 
expensive  to  fabricate  than  their  digital  equivalents. 

In  the  first  architecture,  on  the  other  hand,  it  is  much  less  difficult  to  implement  the 
tally  function  with  the  required  precision  using  simple  analog  circuits.  Votes  can  be  counted 
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Figure  16-1:  Processing  array  with  both  analog  and  digital  elements. 
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by  switching  on  current  sources  at  each  node  when  the  output  of  the  XOR  circuit  goes  high. 
The  currents  from  each  node  can  then  be  summed  on  a  single  wire  and  the  result  converted 
to  a  voltage  so  that  the  total  score  can  be  compared  with,  and  possibly  replace,  the  stored 
minimum  score. 

The  basic  plan  for  a  processor  to  implement  the  matching  procedure  using  both  analog 
and  digital  elements  is  outlined  in  the  block  diagram  of  Figure  16-1.  This  diagram  is  func¬ 
tionally  identical  to  that  of  Figure  15-1.  However,  the  row  tally  blocks  and  the  summation 
circuit  have  been  replaced  by  wires,  and  the  comparison  functions  included  in  the  blocks 
marked  ‘Min  score  &  offset’,  ‘Threshold  test’,  and  ‘Validate’  must  now  be  implemented  by 
analog  circuits. 

In  this  chapter,  I  will  discuss  the  design  of  the  principal  elements  in  Figure  16-1  which 
involve  analog  processing  and  analyze  their  performance  as  predicted  by  simulations,  using 
the  device  parameters  for  the  HP  CMOS26  process,  also  offered  through  MOSIS.  Layouts 
were  generated  for  each  of  these  elements  in  order  to  compare  total  area  requirements  with 
those  of  the  corresponding  digital  implementaion.  Simulation  results  based  on  a  circuit 
extraction  from  the  layout  of  a  5x5  array  are  given  in  the  last  section  to  illustrate  the 
ability  of  the  mixed  analog  and  digital  processor  to  find  an  artificial  test  pattern  in  a  9x9 
search  window. 

16.1  Unit  Cell  Design 

The  primary  change  to  the  unit  cell  is  the  addition  of  switched  current  sources  to  replace 
the  full  adder  circuits  used  in  the  digital  design.  It  was  determined  that  better  matching 
for  the  purposes  of  the  threshold  test  could  be  achieved  with  less  complexity  if  in  fact  two 
identical  current  sources  were  placed  at  each  cell,  one  to  be  switched  on  when  the  value  of 
s®  bis  high,  and  the  other  when  the  value  of  b  is  high.  The  resulting  circuit  diagram  and 
layout  for  the  unit  cell  are  shown  in  Figures  16-2  and  16-3.  The  left  three-quarters  of  the 
layout  containing  the  5-shift  cell,  6-latch  and  XOR  circuit  are  identical  to  the  top  half  of 
the  layout  for  the  digital  cell  shown  in  Figure  15-4.  The  two  current  sources  which  occupy 
the  right  one-fourth  of  the  cell  add  80A  to  its  width  so  that  the  total  cell  measures  16lA(h) 
X  340 A(w),  as  opposed  to  the  412A(h)  x  260A(w)  used  in  the  digital  design.  Including  the 
analog  current  sources  thus  reduces  the  cell  area,  by  almost  a  factor  of  two.  Given  that  the 
adder  tree  needed  to  sum  the  results  from  all  of  the  rows  is  also  no  longer  necessary,  the 
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Turn-on/Turn-off  Behavior  of  Current  Sources 


Figure  16-4:  Transient  behavior  of  switched  current  sources. 


Current  Source  Output  vs.  Output  Voltage 
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Figure  16-5:  Output  current  vs.  load  voltage  for  unit  cell  current  sources. 
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area  required  by  the  full  array  is  only  about  one-third  of  that  needed  by  the  purely  digital 
implementation. 

Three  considerations  were  important  in  determining  the  design  of  the  current  sources 
used  in  the  unit  cells.  The  first,  of  course,  was  to  achieve  good  matching.  The  second  goal 
was  to  maximize  the  range  of  load  voltages  over  which  the  sources  would  behave  as  ideal 
elements.  Finally,  it  was  important  to  minimize  the  output  rise  time  when  the  sources  were 
switched  on  in  order  to  increase  operating  speeds. 

Variations  in  the  IV/L  ratios  of  the  transistors  are  the  only  source  of  mismatch  which  can 
be  directly  influenced  by  the  layout.  The  width  and  length  of  all  transistors  were  thus  made 
as  large  as  possible  so  that  minor  process  variations  in  line  widths  would  have  a  smaller 
percentage  effect  on  the  actual  dimensions.  Increasing  L  also  helps  improve  the  ideality  of 
the  current  sources  as  it  decreases  channel  length  modulation  and  therefore  increases  the 
output  resistance. 

A  simple  p-type  current  mirror  driven  by  a  biased  NMOS  transistor  was  chosen  for 
the  design  because  it  requires  minimal  area  and  allows  a  maximum  load  voltage  of  FjDjp  - 
\Vgs  —  hfl-  The  bias  transistor  was  sized  at  WjL  =  12A/24A  to  give  3.37pA  of  current 
for  1.2V  input  bias.  This  value  was  chosen  so  that  if  all  576  current  sources  of  the  same 
type  in  the  24x24  array  were  switched  on,  the  maximum  output  current  which  would  need 
to  be  handled  would  be  approximately  2mA.  Since  each  source  dissipates  33/iW  of  power 
when  on,  the  maximum  power  dissipation  in  the  full  array  with  all  current  sources  on  is 
38mW.  The  sources  are  turned  on  when  the  gate  voltages  on  the  two  transistors  connected 
in  series  between  the  bias  transistor  and  the  current  mirror  are  brought  high.  The  gate  of 
the  transistor  closest  to  the  p-fet  is  connected  to  the  control  input,  he.,  b  or  s  ®  h,  while 
the  gate  of  the  other  switch  transistor  is  connected  to  a  signal,  labeled  here  as  CLK,  which 
periodically  goes  high. 

The  size  of  the  current  mirror  transistors  was  determined  both  by  the  rise  time  require¬ 
ments  and  by  the  need  to  maximize  both  the  load  voltage  range  and  the  output  resistance. 
Small  values  of  \Vgs  —  Vt\  are  achieved  by  making  WjL  small  while  large  output  resistance 
requires  large  values  of  L.  Fast  rise  times,  however,  are  achieved  by  reducing  the  gate  ca¬ 
pacitance  of  the  mirror  transistors  which  needs  to  be  charged  when  the  current  source  is 
switched  on.  An  appropriate  compromise  between  these  conflicting  needs  was  obtained  by 
choosing  WjL  =  6A/20A.  As  can  be  seen  from  the  simulation  results  plotted  in  Figures  16-4 
and  16-5,  the  current  sources  as  designed  behave  ideally  up  to  load  voltages  of  4.2V,  while 
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the  output  rises  to  within  1%  of  its  final  value  in  160ns. 

The  dashed  lines  in  Figure  16-4  indicate  the  change  in  output  current  for  a  change  of 
±5%  in  the  W/L  ratio  of  either  the  bias  transistor  or  in  one  of  the  mirror  transistors.  The 
resultant  variation  in  output  current  is  also  approximately  ±5%  from  3.55/iA  to  3.2/xA.  By 
using  large  values  for  both  W  and  L  in  all  of  these  transistors,  however,  it  is  hoped  that 
the  standard  deviation  of  W/L  variations  will  be  much  less  5%.  Furthermore,  to  the  extent 
that  mismatches  in  the  current  sources  are  random  and  have  zero  mean  value,  summing  the 
outputs  from  different  sources  will  tend  to  cancel  individual  variations.  The  net  effect  of 
mismatch  on  the  precision  of  the  matching  circuit  should  thus  be  minimal. 

16.2  Global  Test  Circuits 

To  complete  the  matching  procedure,  the  scores  generated  at  each  offset  must  be  com¬ 
pared  with  both  the  threshold  ai||5||  and  the  current  minimum  score.  Since  the  minimum 
score  must  be  stored  as  a  voltage,  it  is  best  to  convert  the  currents  to  voltages  at  this  point, 
as  indicated  by  the  circled  resistive  elements  in  Figure  16-1,  at  the  same  time  scaling  the 
total  current,  Lb,  from  the  base  edge  pixels  by  a  factor  of  oi  with  respect  to  total  score 
current  ly-  Representing  a!i||5||  by  the  voltage  aiVs  also  simplifies  the  block  validation 
test  (14.15),  as  the  values  of  14  and  Vi  can  then  be  supplied  as  external  voltages  which  can 
be  adjusted  as  needed. 

Analog  circuits  are  thus  required  outside  the  unit  cells  for  current  scaling  and  conversion, 
as  well  as  for  comparing  and  storing  voltages.  Since  the  comparator  outputs  are  necessarily 
binary  signals  and  since  the  search  window  offset  values  must  clearly  be  represented  digitally, 
the  remaining  functions  needed  in  the  procedure  of  storing  candidate  offsets  and  computing 
the  maximum  spread  must  be  performed  with  digital  circuits. 

16.2.1  Current  scaling  and  conversion 

The  circuits  used  for  converting  the  currents  Ib  and  ly  to  the  voltages  uiVb  and  Vy 
are  shown  in  Figures  16-6a  and  16-6b.  The  only  difference  in  these  circuits  is  the  size  of  the 
diode-connected  input  transistor,  which  is  twice  as  wide  for  Ib  as  for  ly.  Since  numerous 
simulations  of  the  matching  procedure  on  test  image  sequences  have  shown  that  best  results 
are  obtained  by  setting  oi  to  its  maximum  value  of  0.5,  it  was  chosen  to  hardwire  this  value 


CHAPTER  16.  MIXING  ANALOG  AND  DIGITAL 


267 


Figure  16-6:  I-V  conversion  circuits. 


into  the  circuit.  The  current  into  the  opposing  transistor  of  the  ra-type  current  mirror  in 
Figure  16-6a  is  thus  equal  to  /b/2,  to  within  the  accuracy  allowed  by  the  fabrication. 

In  order  for  the  output  voltages  to  be  within  the  range  required  by  the  comparator 
circuits,  discussed  in  the  following  section,  the  currents  /b/2  and  ly  are  then  pulled  through 
a  second  p-type  current  mirror  whose  output  branch  feeds  into  a  diode-connected  n-fet 
serving  as  the  ‘resistor’.  The  fact  that  this  resistor  is  nonlinear  is  of  no  importance  to  the 
design  since  comparing  the  values  of  /b/2  and  ly  only  requires  that  it  be  monotonic.  It  is 
very  important,  however,  for  the  two  voltage  conversion  circuits  to  be  well  matched.  Large 
geometry  transistors  were  thus  used  to  reduce  the  percentage  mismatch  due  to  line  width 
variations  in  fabrication,  and  long  channels,  L  >  lOA,  were  used  on  each  of  the  transistors  to 
reduce  channel  length  modulation.  Because  of  the  large  current-carrying  capacity  required 
to  accomodate  the  maximum  possible  current  of  ~  2mA,  however,  it  was  necessary  to  use 
large  W/L  ratios,  which  reduces  the  output  resistance  Tq  and  thus  increases  the  output 
nonlinearity. 

The  simulated  current  divider  characteristic  for  the  Ib  circuit  is  shown  in  Figure  16- 
7.  The  curve  is  linear  with  slope  of  exactly  1/2  for  input  currents  up  to  1.2mA.  Beyond 
this  point,  however,  the  slope  decreases  considerably  as  the  transistor  driving  the  p-current 
mirror  is  pushed  into  the  triode  region.  Since  Is  must  be  less  than  /ma^/2,  however,  in 
order  to  pass  the  block  validation  test,  we  are  only  concerned  with  the  behavior  of  the 
divider  up  to  inputs  of  1mA. 

The  I-V  characteristic  of  the  complete  Ib  circuit  for  inputs  of  0  to  1mA  is  shown  in 
Figure  16-8,  while  a  similar  curve  for  the  ly  circuit  is  given  in  Figure  16-9,  but  with  a 
different  input  axis.  For  the  ly  circuit,  the  output  voltage  is  plotted  as  a  function  of  the 
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Figure  16-9:  Output  score  voltage  vs.  number  of  processors  with  non-matching  edge  pix¬ 
els. 


Figure  16-10:  Rise  and  fall  times  of  Vy  for  a  0.3mA  input  after  switching  on  current 
sources  and  before  turning  on  reset  transistor. 
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number  of  processors  responding  by  dividing  tlie  total  current  by  3.37p.A,  which  is  the  unit 
current  from  one  source.  For  a  score  to  pass  the  threshold  test,  it  must  be  the  case  that 
Iv  <  otils  <  F/4,  and  hence  the  maximum  number  of  responding  nodes  for  the  score  to 
be  a  candidate  is  576/4  =  144. 

As  can  be  seen  from  the  diagrams,  the  output  voltages  for  both  the  Lb  and  ly  circuits 
are  between  IV  and  3.2V  when  the  inputs  are  within  the  acceptable  range.  The  slope  of 
the  curve  for  the  ly  circuit  in  its  least  steep  portion  is  llmV/vote.  Since,  according  to  the 
precision  requirements  of  Section  14.3,  it  is  necessary  to  discriminate  between  scores  that 
are  different  by  more  than  3  or  4  votes,  the  resolution  of  the  comparator  circuit  should  thus 
be  better  than  40mV. 

The  transient  behavior  of  the  Ly  circuit  for  a  0.3mA  ramped  input,  corresponding 
roughly  to  the  rise  time  characteristic  of  the  node  sources,  is  shown  in  Figure  16-10.  The 
output  voltage  is  able  to  follow  the  rising  input  without  an  appreciable  delay  such  that 
after  170ns,  the  output  is  stable  at  the  final  voltage.  When  the  node  sources  are  switched 
off,  however,  the  fall  time  is  much  slower  since,  as  the  gate  voltage  on  the  diode-connected 
transistor  decreases,  there  is  less  current  to  discharge  the  gate,  and  when  the  output  voltage 
reaches  the  transistor  threshold,  the  drain  current  becomes  negligible.  It  is  thus  necessary 
to  connect  an  additional  reset  transistor  to  bring  the  output  back  to  zero  at  each  cycle. 
In  Figures  16-6a  and  16-6b,  this  reset  transistor  is  shown  with  its  gate  connected  to  the 
periodic  waveform  ^i.  Timing  of  the  different  clock  signals  used  in  the  matching  circuit  is 
discussed  in  Section  16.3. 

16.2.2  Validation  and  Vmin  tests 

The  remaining  analog  circuit  needed  is  the  voltage  comparator  for  performing  the  vali¬ 
dation  tests  and  for  finding  the  minimum  score.  For  simplicity  a  single  design,  shown  inside 
the  dashed  box  of  Figure  16-11,  was  used  for  all  of  the  tests.  The  two  input  voltages,  in¬ 
dicated  as  Ini  and  1 112,  are  connected  to  the  gates  of  two  identical  p-type  source  followers 
and  1.13pF  capacitors  when  the  signal  CLK — which  is  the  same  as  the  one  which  switches 
on  the  node  current  sources — goes  high.  The  reset  transistors  are  turned  on  once  at  the 
very  beginning  of  the  matching  procedure  to  charge  the  capacitors  to  an  initial  high  value. 
This  operation  is  necessary  to  initialize  the  comparator  in  the  Vmin  circuit,  but  is  irrelevant 
for  the  other  validation  tests. 
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Figure  16-11:  circuit. 
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Source  Follower  Characteristic 


Figure  16-13:  Input /output  characteristic  for  source  followers  used  in  comparator  circuits. 


The  comparator  operation  is  driven  by  the  three  clock  signals  Ri,  R2,  and  i?3  whose 
timing  relative  to  the  CLK  signal  is  shown  in  Figure  16-16.  Initially,  these  signals  are  all 
high  while  the  output  voltages  OiVb  and  Vy  rise  to  their  final  value.  When  CLK  is  brought 
low,  the  capacitors  and  source  follower  inputs  are  isolated  from  the  outputs  of  the  Lb  and 
ly  conversion  circuits.  iZi,  7^2,  and  are  brought  low  at  the  same  time  as  CLK  so  that 
the  output  of  the  source  followers  can  charge  the  gates  of  the  n-type  half  latch  and  the  two 
inverters. 

The  simulated  Vin-Voui  characteristic  of  the  source  followers  with  a  bias  voltage,  I4/i,  of 
3.2V  is  shown  in  Figure  16-13.  The  source  followers  serve  two  purposes  in  the  comparator 
circuit.  The  first  is  to  shift  the  input  voltages  upward  so  that  the  minimum  input  value  is  at 
least  IV  above  the  threshold  of  the  latch  transistors.  The  second  is  to  buffer  the  input  values 
with  a  stable  source  that  does  not  require  intermediate  switches.  When  Ri  is  brought  high, 
the  two  latch  transistors  are  both  turned  on.  The  one  with  the  higher  gate  voltage,  however, 
will  have  the  higher  current,  and  thus  its  drain  voltage  will  drop  more  quickly  than  that  of 
the  other  transistor.  When  one  of  the  drain  voltages  drops  below  \Vin  —  V/|,  where  is 
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the  input  voltage  on  the  source  follower  connected  to  that  side,  the  lower  p-fet  of  the  source 
follower  will  turn  off,  and  the  bias  transistor  will  supply  the  current  needed  to  drive  the 
drain  all  the  way  to  ground.  The  opposing  latch  transistor,  whose  gate  is  connected  to  the 
drain  which  goes  to  OV,  will  be  turned  off,  and  the  output  of  the  source  follower  connected 
to  its  drain  will  return  to  its  value  prior  to  bringing  Ri  high.  The  capacitors  holding  the 
input  voltages  to  the  source  followers  are  sized  large  enough  so  that  any  coupling  through 
the  gate-source  capacitance,  Cgs,  of  the  lower  p-fets  will  be  negligible. 

Once  the  gate  voltages  of  the  latch  transistors  are  stable,  the  signals  R2  and  Rs,  which 
connect  the  outputs  of  the  two  inverters  to  the  cross-coupled  latch  transistors,  are  brought 
high.  In  order  to  avoid  metastable  states,  these  two  signals  are  staggered  so  that  the 
comparator  output  cannot  ‘hang’  if  the  two  input  voltages  are  identical.  R2  is  brought  high 
before  R3  so  that  if  the  drain  voltage  on  the  lefthand  side  is  slightly  above  the  inversion 
threshold,  the  righthand  side  will  be  brought  to  ground,  while  if  it  is  below  the  inversion 
threshold,  the  opposite  side  will  go  to  Vod-  Once  R3  is  brought  high,  the  comparison 
operation  is  complete,  and  the  outputs  Q  and  Q  are  binary  signals  such  that  Q  =  Vdd  if 
In2  >  Ini  and  Q  =  Vdd  if 

The  design  of  the  comparator  was  based  primarily  on  the  needs  of  the  Vmin  circuit  and 
is  more  than  adequate  for  the  validation  tests  comparing  oiVe  to  VVj  and  to  the  upper  and 
lower  limits  Vh  and  V;.  The  Vmin  circuit  is  special,  however,  in  that  only  one  input  voltage 
is  supplied.  The  current  minimum  score  is  stored  on  one  of  the  1.13pF  capacitors  and  is 
compared  with  each  input  value.  If  the  new  value  is  lower,  it  must  then  be  stored  as  the 
minimum  score,  and  the  circuit  must  signal  that  the  minimum  has  changed  so  that  the  new 
offset  position  can  be  latched. 

The  circuit  configurations  for  performing  these  operations  are  shown  in  Figures  16-11 
and  16-12.  The  input  is  gated  to  one  side  or  the  other  of  the  comparator  based  on  the  results 
of  the  last  test  by  connecting  the  outputs  Q  and  Q  to  the  gates  of  two  pass  transistors.  If 
Q  is  high,  the  input  on  the  right  side  was  greater  than  that  on  the  left  in  the  last  compare. 
The  next  score  voltage,  Vy,  will  thus  be  gated  to  the  In2  input  and  will  be  compared  with 
the  value  still  held  on  the  capacitor  on  the  Ini  side.  If  the  new  value  is  less  than  the  old 
stored  value,  Q  will  go  high,  causing  the  next  input  to  be  gated  to  the  In-i  side,  while  the 
new  minimum  value  is  stored  at  Jn2. 

The  toggle  circuit  shown  in  Figure  16-12  is  used  to  indicate  a  change  in  Vmin-  The 
output  Q  is  connected  to  an  inverter  via  a.  pass  transistor  controlled  by  the  signal  CLK. 
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Figure  16-14:  Test  pattern  for  simulating  the  5x5  array. 


When  CLK  is  low,  which  is  the  case  at  the  end  of  a  compare  operation,  the  output  of  the 
inverter  is  the  value  of  Q  from  the  previous  compare.  Connecting  the  outputs  Q  and  Q  to 
the  XOR  circuit  as  shown  thus  makes  toggle  =  Q  ®  Qold  while  CLK  is  low.  If  the  minimum 
value  changes,  the  output  signal  toggle  will  remain  high  until  CLK  again  goes  high. 

16.3  Test  Pattern  Simulation 

To  test  the  operation  of  the  complete  matching  circuit  in  finding  the  best  offset  for  an 
actual  edge  pattern,  a  full  simulation  was  conducted  with  a  5x5  array  and  a  9x9  search 
window.  The  test  patterns  used  in  the  simulation  for  the  base  edge  block  and  the  search 
window  are  shown  in  Figure  16-14.  At  the  correct  offset,  indicated  by  the  dashed  lines,  the 
5x5  edge  pattern  in  the  search  window  matches  that  of  the  test  block  exactly  except  for 
one  pixel  in  the  lower  lefthand  corner  which  is  different.  To  find  the  best  offset,  however, 
the  matching  circuit  must  test  each  of  the  25  different  offset  positions  in  which  the  5x5 
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Figure  16-16:  Clock  waveforms. 


block  is  entirely  contained  in  the  search  window  and  correctly  find  the  minimum  score  at 
the  indicated  location. 

A  layout  for  the  full  circuit  is  shown  in  Figure  16-15  and  includes  the  5x5  matching 
array,  the  Ib  and  7y  scaling  and  voltage  conversion  circuits,  the  Vy-aiVs  comparator,  and 
the  Vmin  circuit,  as  well  as  an  additional  four  rows  containing  a  4x5  s-shift  array  so  that 
an  entire  column  of  the  search  window  can  be  loaded  in  one  input  cycle.  The  metall  lines 
running  horizontally  at  the  base  and  top  of  the  array  carry  the  necessary  control  signals 
and  bias  voltages  for  operating  the  matching  circuit.  The  layout  shown  measures  2007A(h) 
X  1779A(w)  of  which  608A  in  height  are  occupied  by  the  optional  4x5  s-shift  array  and 
172A  in  height  are  taken  by  the  conversion  and  validation  circuits. 

The  waveforms  for  the  principal  periodic  control  signals  are  shown  in  Figure  16-16.  The 
simulation  was  based  on  a  40ns  minimum  pulse  width  with  400ns  required  to  process  each 
offset.  The  clock  signals  and  ^2  control  vertical  movement  on  the  shift  register,  while 
clocks  (^3  and  ^4  control  horizontal  movement.  The  shift  sequence  is  such  that  the  block  is 
first  aligned  with  a  particular  column  in  the  search  window,  after  which  each  row  offset  is 
processed  sequentially.  Once  the  last  row  offset  has  been  processed,  the  entire  search  window 
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Shift  cycle  number 


Figure  16-17:  Vy  and  oiVb  outputs  sanapled  at  end  of  each  cycle. 

is  shifted  horizontally  by  one  column  with  a  new  column  being  read  in  as  the  farthest  one 
on  the  block  is  shifted  out.  For  the  present  configuration,  which  has  five  row  offsets  per 
column,  there  are  thus  four  vertical  shifts  for  every  one  horizontal  shift.  The  400ns  process 
time  for  each  offset  is  due  to  the  160ns  needed  to  complete  each  shift — since  the  clock  phases 
cannot  overlap,  200ns  to  allow  adequate  settling  of  the  node  current  sources,  and  40ns  to 
reset  the  clock  signals  Ri,  i?2,  and  R3  used  in  the  comparator  circuits. 

The  results  of  the  complete  simulation  are  plotted  in  Figures  16-17  through  16-19.  It  is 
clear  from  the  first  diagram,  which  shows  the  peak  values  of  oil's  and  Vy  sampled  at  the 
end  of  each  processing  cycle,  that  shift  positioir  ^13  corresponds  to  the  minimum  score. 
The  minimum  value  of  Vy  is  not  only  significantly  less  than  all  of  the  other  scores,  but 
it  is  also  the  only  which  is  less  than  oiFs-  The  Vy-aiVs  comparator  output,  plotted  in 
Figure  16-18,  confirms  this  fact  as  offset  ^13  is  the  only  one  to  produce  a  high  output.  The 
result  of  the  circuit,  given  by  the  toggle  output  plotted  in  Figure  16-19,  shows  that 
the  minimum  score  fluctuates  a  few  times  at  first,  but  then  rises  for  the  last  time  when  the 
correct  offset  is  reached. 
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Figure  16-18:  Response  of  threshold  test  circuit  at  each  offset  of  test  pattern. 


Figure  16-19:  Response  of  toggle  circuit  at  each  offset  of  test  pattern. 
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16.4  Comparison  with  Digital  Architectures 

The  simulation  results  from  the  test  pattern  indicate  that  the  mixed  analog  and  digital 
matching  circuit  functions  correctly  according  to  its  design.  It  requires  400ns  to  process 
each  offset  and  can  thus  search  a  50x50  window  in  1ms.  A  24x24  array  processor,  including 
the  comparator  and  Imin  circuits,  would  occupy  4432A(h)  X  8160A(w),  based  on  the  layouts 
shown  for  the  unit  cell  and  5x5  array.  In  a.  0.8/tm  (A  =  O.dpm)  process,  one  matching  circuit 
would  require  1.77mm  X  3.26mm,  and  thus  eight  individual  circuits  could  comfortably  fit 
on  a  single  1cm  die. 

In  the  last  chapter,  it  was  estimated  that  the  corresponding  24x24  digital  processor 
required  9888A(h)  X  6240A(w)  for  the  array  alone,  and  at  least  another  400 A  in  width  to 
accomodate  the  summation  circuit  which  adds  the  results  from  each  row.  Digital  versions  of 
the  comparator  and  Vmin  circuits  were  not  designed  or  laid  out  for  this  processor,  however, 
the  area  taken  by  these  components  should  also  be  considered.  In  a  O.Sfim  process,  the 
array  and  summation  circuits  together  require  at  least  3.95mm  x  2.66mm,  and  thus,  at 
best,  four  processors  might  fit  on  a  1cm  die.  The  time  required  per  offset  was  not  estimated 
for  the  digital  processor.  However,  it  is  not  at  all  clear  that  it  could  be  operated  faster  than 
the  mixed  analog/digital  design.  Since  the  same  time  (200ns)  is  required  by  both  circuits 
for  shifting  the  search  window,  the  difference  in  their  speeds  is  determined  by  the  time 
required  to  compute  each  score  and  compare  it  to  the  minimum  value.  For  the  fully  digital 
processor  to  be  faster,  the  tally  circuit  and  Vmin  comparator  would  have  to  be  designed  to 
complete  these  operations  in  less  than  the  200ns  required  by  the  analog  circuit. 

The  second  digital  architecture  studied  is  much  faster  than  the  first  one,  as  its  total 
processing  time  is  determined  by  the  size  of  the  block  being  matched  and  not  that  of  the 
search  window.  Furthermore  the  computations  for  each  edge  pixel  only  require  updating 
the  8-bit  accumulators  at  each  array  node,  and  thus  the  delays  are  much  shorter.  The  major 
disadvantages  of  this  design  are  its  large  size  and  the  fact  that  the  dimensions  of  the  array 
limit  the  maximum  search  area.  It  was  estimated  at  the  end  of  the  last  chapter  that  one 
could  at  best  fit  a  single  30x30  array  on  a  1cm  die. 

In  summary,  the  mixed  analog/digital  matching  processor  does  appear  to  best  meet  the 
needs  of  the  motion  estimation  system  as  they  have  been  formulated.  This  processor  has 
an  8-to-l  area  advantage  over  the  digital  circuits  used  for  motion  estimation  in  commercial 
systems  {i.e.,  the  second  architecture)  and  does  not  restrict  the  search  window  size.  It  has. 
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at  least,  a  2-to-l  area  advantage  over  its  direct  digital  equivalent  and  can  be  operated  at 
comparable,  or  better,  speeds.  The  maximum  power  dissipation  of  each  24x24  processor 
during  the  200ns  computation  cycle  is  estimated  at  <  40mW  in  the  worst  possible  case 
when  all  internal  current  sources  are  turned  on. 
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Chapter  17 


Recommendations  for  Putting  It  All  Together 


It  is  appropriate  to  conclude  by  combining  the  results  of  the  previous  chapters  into  a 
plan  for  the  complete  system  design.  One  possible  configuration  of  a  single  ‘board’  motion 
estimation  system,  including  the  multi-scale  veto  processor  and  the  analog/digital  edge 
matching  circuits  developed  in  this  thesis,  is  shown  in  Figure  17-1.  From  the  analysis  of 
Chapter  8,  we  know  that  the  spatial  resolution  of  the  image  sensor  must  be  sufficiently  fine 
so  that,  given  the  block  size  used  in  the  matching  procedure,  each  block  can  be  reasonably 
approximated  as  a  single  point.  From  tests  with  real  image  sequences,  it  has  been  seen 
that  the  minimum  sensor  size  needed  to  obtain  accurate  motion  estimates  is  approximately 
256x256  pixels. 

Following  the  recommendations  of  Chapter  13,  it  should  be  possible,  with  a  0.8/um 
or  better  CCD/CMOS  process,  to  build  a  256x256  MSV  focal-plane  processor  which  will 
perform  both  imaging  and  edge  detection.  The  advantage  of  the  focal-plane  processor  is 
that  the  signal  degradation  incurred  in  loading  the  image  one  pixel  at  a  time  through  the 
fill- and- spill  input  structure  and  transferring  the  charges  over  the  length  of  the  gate  array  is 
avoided.  The  improved  design  does  not,  however,  offer  any  savings  in  total  processing  time, 
as  the  time  not  spent  in  loading  the  image  is  amply  made  up  for  in  sequentially  performing 
the  differencing  and  threshold  tests  for  each  row  and  column.  Assuming  the  processor  is 
operated  at  5MHz,  and  including  a  normal  image  acquisition  time  of  1ms,  it  should  take 
approximately  5ms  for  the  edge  detector  to  process  each  frame  and  deliver  the  binary  edge 
outputs  to  the  memory  buffers. 

As  discussed  in  Section  11.5.2,  most  of  the  power  required  to  operate  a  CCD  array  is 
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Figure  17-1:  Single  ‘board’  motion  estimation  system. 
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dissipated  in  the  clock  drivers.  To  minimize  power  requirements,  a  separate  chip  containing 
specially  designed  drivers  tuned  for  the  capacitive  and  inductive  loads  of  the  system,  should 
be  included  to  supply  the  clock  waveforms  for  the  MSV  procesor.  Tuned  drivers  can  reduce 
power  dissipation  by  a  factor  of  l/Q/,  where  Qf  is  the  quality  factor  of  the  oscillator. 

To  obtain  accurate  motion  estimates,  it  is  essential  to  try  to  match  a  large  number  of 
blocks  in  order  to  ensure  that  a  sufficient  number  of  correspondence  points  will  be  found. 
In  the  numerous  tests  performed  with  real  image  sequences,  it  was  often  observed  that  the 
hit  rate,  i.e.,  the  fraction  of  blocks  having  acceptable  matches,  was  very  low  in  ordinary 
scenes  due  to  a  combination  of  the  lack  of  distinctive  features  and  the  frequent  occurrence 
of  repeating  patterns  which  give  multiple  candidate  matches.  It  was  typical  to  obtain  only 
30-50  correspondences  out  of  more  than  400  blocks  tested. 

It  would  not  be  practical  to  include  400  matching  circuits  on  the  motion  system  board 
both  for  reasons  of  size  and  of  power  consumption — even  if  8  individual  circuits  are  contained 
in  each  chip.  The  proposed  configuration  of  Figure  17-1  holds  four  chips,  such  that  a  total  of 
32  blocks  can  be  matched  simultaneously.  By  pipelining  the  matching  operations,  448  blocks 
can  be  tested  in  14  cycles.  At  the  end  of  Chapter  16,  it  was  calculated  that  a  50x50  window 
could  be  searched  in  1ms.  Taking  this  as  the  average  window  size,  the  entire  matching 
process  can  thus  be  completed  in  14ms  plus  some  additional  time  to  manage  operational 
overhead.  Assuming  that  imaging  and  edge  detection  require  5ms,  and  estimating  that 
roughly  10ms  are  needed  to  both  solve  the  motion  equations  and  perform  other  housekeeping 
tasks,  the  complete  system  should  be  able  to  process  each  image  pair  in  just  over  29ms  and 
thus  achieve  a  throughput  of  slightly  better  than  30  frames/sec. 

The  final  important  component  of  the  system  is  the  micro-controller  which  sequences  the 
operations  on  the  board  and  also  performs  the  motion  computations.  This  component  was 
hardly  covered  in  this  thesis  as  there  are  many  commercially  available  digital  processors 
which  could  be  used  for  this  purpose.  It  is  important,  however,  to  choose  one  which  is 
adequate  for  the  task  but  does  not  include  unnecessary  functions  which  will  increase  power- 
consumption,  and  possibly  the  cost  of  the  processor  itself. 

There  are,  of  course,  many  issues  left  to  be  resolved  to  complete  the  motion  system.  The 
proposed  new  design  of  the  MSV  edge  detector  should  be  fabricated  in  a  better-controlled 
and  smaller  scale  CCD  process,  and  then  tested  to  evaluate  its  performance.  Full-size 
matching  circuits  should  also  be  fabricated  to  verify  their  actual  performance  with  that 
predicted  by  simulation.  The  tuned  clock  driver  chip  for  controlling  the  MSV  processor 
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must  be  designed,  and  an  adequate,  but  minimal,  microprocessor  should  be  chosen  to 
control  the  system. 

Constructing  the  full  system  is  outside  the  scope  of  any  one  thesis.  The  major  con¬ 
tribution  of  this  research  has  been  to  thoroughly  analyze  the  theoretical  and  algorithmic 
constraints  imposed  on  the  system  and,  given  these  constraints,  to  carefully  study  the  de¬ 
sign  of  its  two  most  important  components.  The  results  developed  in  this  thesis  have  thus 
established  the  foundation  which  will  serve  as  the  basis  of  further  work  for  building  the 
complete  motion  estimation  system  according  to  the  goals  set  for  its  design. 
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Appendix  A 


Quaternion  Algebra 


Quaternions  are  vectors  in  which  ma.y  be  thought  of  as  the  composition  of  a  scalar 
and  ‘vector’  part  [4]. 

a=(ao,a)  (A.l) 

where  a  =  {ax,ay,az)'^  is  a  vector  in 
Conjugation  is  defined  by 

a*  =  (ao,-a)  (A. 2) 

and  the  multiplication  of  two  quaternions  is  by 

ab  =  (aobo  —  a  •  b,  aob  +  bga.  +  a  X  b)  (A. 3) 

Quaternion  multiplication  is  associative,  but  not  commutative.  The  identity  with  respect 
to  multiplication  is 

e  =  (l,0)  (A.4) 

In  all  other  respects,  quaternions  may  be  treated  as  ordinary  vectors.  The  transpose, 
dot  product,  and  multiplication  by  a  matrix  are  defined  in  the  usual  manner.  One  way  to 
express  the  same  operation  in  (A. 3)  is  by  a  matrix-vector  multiplication  using  equivalent 
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quaternion  matrices.  For  example, 


ao 

(lx 

ay 

-dz 

O'X 

ao 

—f>-z 

ay 

Cly 

O-z 

ao 

O'Z 

—  ay 

(Iz 

ao 

bo 

-bx 

-by 

-bz 

b. 

bo 

bz 

by 

by 

-bz 

bo 

bx 

bz 

by 

-bz 

bo 

b  =  ^b 


a.~m 


(A.6) 


A  is  referred  to  as  the  left  quaternion  matrix  associated  with  a,  and  B  is  referred  to  as 
the  right  quaternion  matrix  associated  with  b.  Quaternion  matrices  are  useful  for  re¬ 
arranging  formulas  into  more  convenient  expressions.  Using  either  (A. 3)  or  the  matrix 
formulations  (A. 5)  and  (  A.6),  the  following  identities  can  be  easily  shown 


(aa*)  = 

(a  •  a)e 

(A.7) 

(ab)*  = 

Tr 

(A.8) 

(aq) • (bq)  = 

(a-b)(q-q) 

(A.9) 

o 

(aq) • b  = 

a.(bq*) 

(A.IO) 

A  vector  in  can  be  represented  by  a  quaternion  with  zero  scalar  part.  If  f,  f,  and  b 
are  quaternions  with  zero  scalar  part  then 


f*  =  — f 

(A.ll) 

T  ■  £  =  r  ■  £ 

(A.12) 

fi  =  (— r  •  £,  r  X  £) 

(A.13) 

b  ■  (fi)  =  b  •  (r  X  .£) 

(A.14) 

The  last  identity  above  is  the  quaternion  representation  of  the  triple  product. 

The  reason  that  quaternions  are  so  useful  is  because  of  the  simplicity  with  which  rotation 
about  an  arbitrary  axis  can  be  represented.  A  unit  quaternion  is  one  whose  magnitude. 
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defined  as  the  square  root  of  its  dot  product  with  itself,  is  unity. 


qq  =  1 


(A.15) 


Every  unit  quaternion  represents  a  rotation  in  IR^  of  an  angle  0  about  an  axis  Q  in  the 


sense  that 


q  =  (cos  -  ,  u>  sin  - 


(A.16) 


The  rotational  transformation  of  a  vector  £  is  found  from 


£'  =  q£q* 


(A.17) 


where  i'  is  the  quaternion  representation  of  the  rotated  vector,  Using  quaternion  matri¬ 
ces,  we  can  also  write  (A.17)  as 


£'  =  QQ*i 


0  R 


where  R  is  an  orthonormal  rotation  matrix.  Expanding  terms  we  find 


(A.18) 


(go +  9^  -  -  ?;)  Hqxqy  -  qoqz)  ‘^{q^qz  +  qoqy) 

R=  2(qrqy  +  qQqz)  {qo  -  ql  +  qy  -  ql)  ‘^{qyqz  -  qoqx)  (A.i9) 

\  2{qa:qz  -  qoqy)  2{qyq,  +  qoqx)  iqo  -  ql  -  qy  +  qz)  j 


If  we  are  given  an  orthonormal  rotation  matrix  R,  the  corresponding  unit  quaternion  can 
be  found  by  noting  that 

2  2  ^  Tr(R)  +  1  f  A  r>n\ 

go  =  cos  -  =  - - -  (A.20) 


and  by  solving  the  eigenvector  equation 


Rd?  =  u? 


(A.21) 


Additional  results  on  quaternions  and  their  properties  can  be  found  in  [88]  and  [4]. 


Appendix  B 


Special  Integrals 


In  the  analysis  of  the  numerical  stability  and  error  sensitivity  of  the  motion  estimates, 
it  is  necessary  to  compute  the  following  integrals: 


fD  f'2'K 

/  /  lefe^d^da 

Jo  Jo 

(B.l) 

fD  f2'K 

/  /  \e'\Ud^da 

Jo  Jo 

(B.2) 

e'£'^  ^d^  da 

Jo  Jo 

(B.3) 

fD  f2n 

/  /  \ef£'£'^^d^da 

(B.4) 

Jo  Jo 

fD  f27r 

/  /  Cd^da 

(B.5) 

Jo  Jo 

r  r'  (ui  •  £')(u2  ■  £')£'£’'^  i  di  da 

Jo  Jo 

(B.6) 

:ary  vectors  and 

£'  =  m 

(B.7) 

with  I  being  the  vector  from  the  center  of  projection  to  a  point  (,^  cos  a,  ^  sin  a)  on  the 
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^  da 


Direct  multiplication  gives 


\e\^  =  =  1  + 


(B.13) 


and  the  integral  is  straightforward: 

I'D  /*2x 

/  / 

Jo  Jo 


/*2x  I'D 

/  \Cdidida  =  27r  /  +  + 

Jo  Jo 


;rD2  1  ^  +  — 


(B.14) 


B.3  JJi'i'^^d^da 

We  start  by  writing  as 


e'£’^  = 


cos^  a  cos  a  sin  a  cos  a 

££^  =  cos  O'  sin  O'  sin^  o  ^  sin  o- 

^  ^  cos  o  ^  sin  o  1  y 

Moving  the  terms  R  and  outside  the  integral  and  integrating  over  a  we  obtain 

/  7r^2  Q  0  \ 

/•27r 

/  da  =  0  TT^'^  0 

Jo 

^  0  0  27r  y 

The  integral  over  ^  then  gives 


(B.15) 


(B.16) 


(B.17) 


pD  r27T 

/  /  M^^d^da 

Jo  Jo 


=  irD^ 


0 

0  \ 

TT^Vd  0 

(B.18) 

0 

ttD^  j 

(l  - 

(B.19) 
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Combining  (B.19)  and  (B.ll)  we  obtain  finally 


£'£'^  ^d^da  =  r(  r  ££^  e  dC  da]  R 

Jo  Jo  \Jo  Jo  ) 

-  -kD'^  hh^]  +  hi'3^ 


(B.20) 


B.4  I  j  \Cf(Ji'^e,dida 


The  only  difference  between  this  integral  and  the  previous  one  is  the  presence  of  the 
term  =  1  +  Since  this  term  does  not  depend  on  a  we  can  proceed  as  before 

to  find 

/  A.eEe)  0  0  \ 


pZTT 

/  1^1 

Jo 


'££^  da  = 


0  +  0 
0  0  2^(1 + 


(B.21) 


Integrating  over  ^  we  then  have 


n27r 

\£f££^^d^da  = 


rr(  D*  \ 


0  TCf  +  f) 


D'^ 


T>2  D- 


(B.22) 


Including  the  rotation,  we  obtain 


nZLT  I  fJJ  fZTT  \ 

=  R  /  /  R 

\Jo  Jo 


D^  D^\ ^  D'^\  .  .  T 

T  +  T  '+  ‘  +  T-Tr”’’ 


(B.23) 
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B.5  J 

This  integral  is  somewhat  more  tedious  than  the  previous  ones  due  to  the  (u 
which  does  depend  on  a.  We  start  by  defining 

<) 

and  note  that  from  the  definition  (B.ll) 

=  u  •  hi,  Uy  =  u  -  V2,  and  u'.  =  u  ■  (B.25) 

Since  u  •  £'  =  u'  •  ^  we  can  write 

u-  £’  =  u[.  ^  cos  a  +  Uy  ^  sin  o  +  u'^  (B  .26 ) 

This  scalar  term  multiplies  each  element  of  the  array  producing  many  products 

of  trigonometric  terms.  Fortunately,  only  a  few  of  these  are  non-zero  after  integrating  over 
a  from  0  to  27r.  We  have 

{ 0  ^ 

|■2^T 

/  (u  •  da  =  TT  0  (B.27) 

\  u'y^'^  2ul  ^ 

and  performing  the  integration  over  we  obtain 

/  0  <F>V4  ^ 

j  J  ^  {u  ■£')£' da  =  7rD^K  0  KD^I  u'yD'^/A  R"^ 

i  u'yD^/A  <  j 


■  £')  term 


(B.24) 


APPENDIX  B.  SPECIAL  INTEGRALS 


305 


There  is  a  special  case  which  is  worth  noting.  If  u  =  V3,  equation  (B.28)  becomes 

fD  f2Tr  ( D^  (  D^\  \ 

J  J  {vs  •  £')£'£'^  ^d(da  =  7ri)^(— I+(l - 

fD 

=  £'(■'  i  di  da 

Jo  Jo 

Multiplying  the  integrand  by  {vs  ■  £')  thus  has  no  effect  on  the  result. 


(B.29) 


B.6  //(ui  •  i'){u2  ■  da 


In  this  final  integral  even  more  complex  trigonometric  products  are  encountered.  Ex¬ 
panding  the  term  (uj  •  we  have 


(Ui  •£')(u2  =  u'j££'^u'2 


=  COS^  O'  +  u[yU2y(^  sin^  a  +  u[^U2^ 

+  +  ^2xKy)  COS  a  sin  a  -f  {u[y2z  +  J  ^  cos  a 


+  +  U2yUiJ^sma 


(B.30) 


where  u'l  =  R^’^ui  and  u'2  =  R^U2.  The  integral  is  written  as 


/  /  ''(ui-£'){u2-£’)P^'^  dot  =  R  f  /  [  ''iu’'^££^u’2)££'^  ^d(da]  R^  (B.31) 
Jo  Jo  \Jo  Jo  J 

so  that  each  element  of  the  matrix  ££  is  multiplied  by  the  scalar  u\££  U  2-  The  only 
trigonometric  products  which  survive  the  integration  over  a,  however,  are  the  following: 


4  J  37r 

/  cos  a  da  =  — 

Jo  4 

f2w  q— 

/  sin"*  ada  =  — 


fzv  q_ 

/  sin"*  ada  =  — 
Jo  4 

r2'jv 

/  cos^  a  sin^  a  da 

Jo 

r2'K 

/  cos^  ada  =  TT 

Jo 

i‘2t\ 

/  sin^  ada  =  TT 

Jo 


(B.32) 

(B.33) 

(B.34) 

(B.35) 

(B.36) 
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da  =  27r 


We  now  define  the  following  integrals: 


cos^  a  da 


D'^I2  0  0 

0  Z)V6  0 

0  0  1 


L2  =  /  /  sin^  a-  ^  da 

Jo  Jo 


D^/6  0  0 

0  D^/2  0 

0  0  1 


^d^  da 

'  D^/A  0  0  ^ 

=  0  £>V4  0 

\  0  0  ij 

n2^  T  9 

^  cos  a  sin  «  ^  d^  da 


0  1  0 
1  0  0 
0  0  0 


I'D  i'2'k 

L5  =  /  /  i  cos  a  ^  d(  da 

Jo  Jo 


0  0  1 
0  0  0 
1  0  0 


pD  p27V 

/  /  £i^  ^  sin  a  ^  d^  da 

Jo  Jo 


(B.38) 


(B.39) 


(B.40) 


(B.41) 


(B.42) 
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To  simplify  notation,  we  define  the  vectors  wi  and  W2  as 


Wi  = 


{V3  X  Ui)  X  ha  =  ui,  and,  W2  =  (h  X  U2)  \  V3  =  (l  -  U2  (B.47) 


Since  ha  '  1^3  =  we  also  note  that 


Wi  •  W2  =  Ui  •  U2  -  (ui  •  h3)(u2  '  V3) 


(B.48) 


Equation  (B.46)  can  thus  be  reduced  to 

I  ^lxU2x  +  u'ly'>J-2y/^  (m'i  A  +  «2x-«i3/)/3  0^ 

TT  T 

g  ^  {'^'lx'^2y  A  U2^u\y  )l3  +  u\yU2y  0  R  — 

\  0  0  0  y 

W1W2  +  W2w7  +  (wi  •  W2)  (l  -  haha^]  (B.49) 
24  L  \  / j 

The  second  matrix  of  equation  (B.45)  can  be  written  as 


'^lz'^2z 


Kz^2z 


u[yU'2,  +  U'2ytl[^ 


tl[^u'2,  +  l4,w'i. 

U'ly^2z  +  U2yKz  = 

U'l  •  u'2  / 


-^R  ^U2z  +  ^l2 

-  4:u[.U2^zz'^  +  u[^U2^  (l  -  +  (u'l  •  u'2)ih'’^  R^ 


(U2  •  ha)  (uiha"^  +  hgu^)  +  (uj  •  ha)  (u2ha'^  +  haiij) 

-  4(ui  •  h3)(u2  •  h3)h3h3'^  +  (ui  •  h3)(u2  •  ha)  -  fia'ys^) 

+  (Ui  ■  U2)h3h3’^] 


(B.50) 


From  (B.47)  we  can  derive  the  following  expression  for  wiwj  +  W2w7  as 


wiwj  +  W2W1'"  =  U1U2  +  uiuj  —  (u2  •  ha)  ^Uihs^  +  hsuj^ 

-  (ui  •  ha)  (u2h3'^  +  hauj )  +  2(ui  •  h3)(u2  •  h3)h3h3'^ 


(B.51) 
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Then,  using  (B.48)  and  (B.51),  we  can  rewrite  equation  (B.50)  as: 


Az'^'2z 

0 

AxAz 

0 

^'ly^2z 

'^'2z  + 

U'l 

UiuJ  +  U2uJ  -  WjwJ  -  W2wJ  -  2(ui  ■  -&3)(U2  '  'h)v3V3^ 
+  (ui  ■  11-2  -  Wi  •  W2)  (l  -  +  (Ui  •  U2)h3h3’^ 


UluJ  +  U2u'f  +  (Ui  •  U2)  I  -  2(Ui  •  V3)(U2  •  i)3)h3-f)3'^ 
-  +  W2w7  +  (wi  •  W2)  -  Ws'lis^)) 


-  +  W2w7  +  (wi  •  W2)  (l  -  Ws'iis^))]  (B.52) 

Finally,  we  combine  equations  (B.49)  and  (B.52)  with  the  third  term  of  equation  (B.45)  to 
obtain  the  result: 

fD  f2Tr 

/  /  {ui  ■  £'){u2  ■  £')£'c  da  = 

Jo  Jo 

'K  D‘^  ( D^  i')!  T  T(  \  fr  ^  T\ 

— ^  I  —  1  j  [W1W2  +  W2W1  +  (Wi  •  W2)  (^I  -  U3U3  j 


ttD'^  r  t  T  /  ^  T 

+  [U1U2  +  U2U1  +  (Ui  •  U2)I 


+  TtD^  (  1 - —  j  (ui  •  {)3)(U2  •  Vs) 


(B.53) 


We  note  that  a  special  case  occurs  if  either  ui  or  U2  is  equal  to  vs-  The  solution  then 
becomes 


n27r 

ivs-£')iu-£')£'£'^  ^d^  da 


irD^  —  (^u^s  +  V3U  j  +  (u-  ^3)— I  +  ^  1 - —  j  (u  •  vsjvsvs 

rD  r2K 


=  [  [  \u-£')£'£'^  ^d^da 

Jo  Jo 

Again,  multiplying  the  integrand  by  {vs  ■  £')  has  no  effect  on  the  result. 


(B.54) 


